javascript - 调用用户提供的“pageFunction”时出错：错误：TypeError：JSON.stringify 无法序列化循环结构

Question

我正在使用Apify，一种无头浏览器服务来编写网页抓取爬虫，它是 Javascript。

我正在尝试收集我在博客上发表的数百篇文章的文章内容。

爬虫通过在 Apify 的 Web 界面中指定起始页和列表页来工作，它们是包含文章链接的分页索引，以及它应该从那里爬取的目标文章的 URL 模式......

在我选择的名字中...

开始：https ://www.example.com/author/myname
列表：https://www.example.com/author/myname/page/[ \d+]
详情：https://www.example.com/[ \d+] /[\d+] /[a-z0-9]+(?:-[a-z0-9]+)*.html$

这是爬虫代码...

function pageFunction(context) {

    // Called on every page the crawler visits, use it to extract data from it
    var $ = context.jQuery;

    // If page is START or a LIST,
    if (context.request.label === 'START' || context.request.label === 'LIST') {

        context.skipOutput();

        // First, gather LIST page
        $('a.page-numbers').each(function() {
            context.enqueuePage({
                url: /*window.location.origin +*/ $(this).attr('href'),
                label: 'LIST'
            });
        });

        // Then, gather every DETAIL page
        $('h3>a').each(function(){
            context.enqueuePage({
                url: /*window.location.origin +*/ $(this).attr('href'),
                label: 'DETAIL'
            });
        });

    // If page is actually a DETAIL target page
    } else if (context.request.label === 'DETAIL') {

        result = {
            "title": $('h1')
        };

    }
    return result;
}

我认为这种结构可能有些正确。

从 START 和 LIST 中，这可以正确识别要抓取的正确 URL，这不是问题所在。Apify 行为是pageFunction为要提取数据的每个页面触发。我的目标是仅提取每个页面的H1标签进行测试。

问题是对于每一个（即当pageFunction执行时），爬虫不是返回H1标签，而是返回......

Error invoking user-provided 'pageFunction': Error: TypeError: JSON.stringify cannot serialize cyclic structures.

我已经阅读了JSON.stringify我不完全理解这个问题。

javascript - 调用用户提供的“pageFunction”时出错：错误：TypeError：JSON.stringify 无法序列化循环结构

0 回答 0

Related

Reference