jquery - 如何更正 Apify 中的选择器以从 JSON 数据链接获取数据？

Question

我正在使用 Apify 从 json 文件链接中获取数据。这是json数据：

<html>
    <body>
        <pre>
            {"exhibitor-single":[{"firstname":"Ines","lastname":"Konwiarz","email":"georg.jansen@020epos.de"}]}
        </pre>
    </body>
</html>

所以，我在 apify webscraper 任务中使用了以下代码。

async function pageFunction(context) {
    const request = context.request;
    const $ = context.jQuery;

   var data =  $('body > pre').html();
   var items = JSON.parse(data);

       return {
        Url: request.url,
        Last_Name: items[`exhibitor-single`].lastname,
        First_Name: items[`exhibitor-single`].firstname,
        Email: items[`exhibitor-single`].email

        };
}

该变量data具有适用于 json 数据的正确 css 选择器。但是，它没有返回任何数据。谁能帮我找出这里出了什么问题？提前致谢。

score 1 · Accepted Answer

从 pageFunction 结构来看，我猜你正在使用apify/web-scraper。

如果您只想从 JSON 链接中获取数据，您可以轻松使用apify/cheerio-scraper。由于您不需要打开整个浏览器，因此它将花费更少的计算能力。

您需要在cheerio scraper 中使用设置pageFunction 来获取JSON 数据：pageFunction：

async function pageFunction(context) {
    const { request ,json } = context;
    const items =  json;
    return {
        Url: request.url,
        Last_Name: items.lastname,
        First_Name: items[`exhibitor-single`].firstname,
        Email: items.email
    };
}

Cheerio scraper 默认仅支持 HTML 响应，您需要在 Advanced 配置中更新 Additional mime types 的值：

[
  "application/json"
]

jquery - 如何更正 Apify 中的选择器以从 JSON 数据链接获取数据？

1 回答 1

Related

Reference