javascript - 使用 node.js 请求抓取网站并获取奇怪的字符

Question

我使用了 nwjs（版本 0.18.8），并在 mangafox.me 上请求做一个 mangareader。

当我尝试对这样的漫画图像发出请求时 http://mangafox.me/manga/onepunch_man/vTBD/c066/1.html我得到了这些奇怪的符号：

��{s�F��[��w#Y�\�AI�(tY��dϯ��M%9��@�Cw��~��I(v��ʑ �y��t��k2z��o��y��.^~wɌ�e��Ҳ�]?c��Kf�=v��0�3 ? y`Y�_̘gY|fY��\�Q2��M��nV�iz�g��b$W�_a��c�C5

我怎样才能解决这个问题？

score 1 · Accepted Answer

没关系 x) 实际上只是输出被压缩成 zip，所以如果你想解决它，如果你有同样的问题，只需在请求头中添加 gzip: true 例如：

request({url: '*****', gzip: true}, function(err, res, html){

   if (!error && response.statusCode == 200) {

   //Do something

   }

});

score 0 · Accepted Answer

你不需要 node.js 来做这么简单的事情。抓取站点的最简单方法是将其加载到隐藏的 iframe 中，然后循环遍历文档的所需元素数组。

加载的文档为您提供了像这样的数组中的所有内容......

 Frame.contentWindow.document.forms

 Frame.contentWindow.document.scripts

 Frame.contentWindow.document.styleSheets

 Frame.contentWindow.document.embeds

 Frame.contentWindow.document.cookie

 Frame.contentWindow.document.images

 Frame.contentWindow.document.links

等等……

javascript - 使用 node.js 请求抓取网站并获取奇怪的字符

2 回答 2

Related

Reference