0

我们正在开发一种网络爬虫类型的东西,用户输入网站的 url,我们的网络应用程序会生成网站的屏幕截图。我们使用 phantomjs 的渲染来生成 PNG 格式的屏幕截图。尽管在大多数情况下它就像一个魅力,但有些网站没有正确呈现。例如,如果您使用http://dorevi.lt/它在浏览器中显示的网站:

浏览器快照(在最新的 chrome 和 ie 上测试)

然而 phantom 渲染的截图如下:

phantom 2.1 渲染的图像

您可以看到它拉伸了中心表并打破了中间的内容。到目前为止,我尝试过的是:

  1. 试图在页面读取和页面渲染之间放置各种延迟,甚至长达 30 秒,但没有运气。

  2. 尝试了这个答案中的所有解决方案,我们等待加载 DOM 内容(内部 stlysheets 等),但同样的输出

  3. 尝试在执行 phanjomjs 脚本时添加所有可能的参数,这就是我的最终命令的样子: phantomjs.exe --ignore-ssl-errors=true --load-images=true --ssl-protocol=any --debug=true --local-to-remote-url-access=true --web-security=false --disk-cache=false script.js

如您所见,我也使用了所有可能的标志,但输出仍然相同。请帮助我,因为我们需要确保生成准确的网页截图。

信息:使用的 Phantomjs 版本:2.1 操作系统:CentOS 用于生产,也在 Windows 7 上进行测试,输出相同 技术:应用程序用于构建 PHP

编辑 1:添加 --debug=true 输出

2017-12-09T15:31:40 [DEBUG] CookieJar - Created but will not store cookies (use
option '--cookies-file=<filename>' to enable persistent cookie storage)
2017-12-09T15:31:41 [DEBUG] Set  "http"  proxy to:  "" : 1080
2017-12-09T15:31:41 [DEBUG] Phantom - execute: Configuration
2017-12-09T15:31:41 [DEBUG]      0 objectName : ""
2017-12-09T15:31:41 [DEBUG]      1 cookiesFile : ""
2017-12-09T15:31:41 [DEBUG]      2 diskCacheEnabled : "true"
2017-12-09T15:31:41 [DEBUG]      3 maxDiskCacheSize : "-1"
2017-12-09T15:31:41 [DEBUG]      4 diskCachePath : ""
2017-12-09T15:31:41 [DEBUG]      5 ignoreSslErrors : "true"
2017-12-09T15:31:41 [DEBUG]      6 localUrlAccessEnabled : "true"
2017-12-09T15:31:41 [DEBUG]      7 localToRemoteUrlAccessEnabled : "true"
2017-12-09T15:31:41 [DEBUG]      8 outputEncoding : "UTF-8"
2017-12-09T15:31:41 [DEBUG]      9 proxyType : "http"
2017-12-09T15:31:41 [DEBUG]      10 proxy : ":1080"
2017-12-09T15:31:41 [DEBUG]      11 proxyAuth : ":"
2017-12-09T15:31:41 [DEBUG]      12 scriptEncoding : "UTF-8"
2017-12-09T15:31:41 [DEBUG]      13 webSecurityEnabled : "false"
2017-12-09T15:31:41 [DEBUG]      14 offlineStoragePath : ""
2017-12-09T15:31:41 [DEBUG]      15 localStoragePath : ""
2017-12-09T15:31:41 [DEBUG]      16 localStorageDefaultQuota : "-1"
2017-12-09T15:31:41 [DEBUG]      17 offlineStorageDefaultQuota : "-1"
2017-12-09T15:31:41 [DEBUG]      18 printDebugMessages : "true"
2017-12-09T15:31:41 [DEBUG]      19 javascriptCanOpenWindows : "true"
2017-12-09T15:31:41 [DEBUG]      20 javascriptCanCloseWindows : "true"
2017-12-09T15:31:41 [DEBUG]      21 sslProtocol : "any"
2017-12-09T15:31:41 [DEBUG]      22 sslCiphers : "ECDHE-ECDSA-AES128-GCM-SHA256:
ECDHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA:ECD
HE-ECDSA-AES128-SHA:ECDHE-RSA-AES128-SHA:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-RC4-SH
A:ECDHE-RSA-RC4-SHA:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA:DHE-RSA-AES256-SHA:AES
128-GCM-SHA256:AES128-SHA:AES256-SHA:DES-CBC3-SHA:RC4-SHA:RC4-MD5"
2017-12-09T15:31:41 [DEBUG]      23 sslCertificatesPath : ""
2017-12-09T15:31:41 [DEBUG]      24 sslClientCertificateFile : ""
2017-12-09T15:31:41 [DEBUG]      25 sslClientKeyFile : ""
2017-12-09T15:31:41 [DEBUG]      26 sslClientKeyPassphrase : ""
2017-12-09T15:31:41 [DEBUG]      27 webdriver : ":"
2017-12-09T15:31:41 [DEBUG]      28 webdriverLogFile : ""
2017-12-09T15:31:41 [DEBUG]      29 webdriverLogLevel : "INFO"
2017-12-09T15:31:41 [DEBUG]      30 webdriverSeleniumGridHub : ""
2017-12-09T15:31:41 [DEBUG] Phantom - execute: Script & Arguments
2017-12-09T15:31:41 [DEBUG]      script: "script.js"
2017-12-09T15:31:41 [DEBUG] Phantom - execute: Starting normal mode
2017-12-09T15:31:41 [DEBUG] WebPage - setupFrame ""
2017-12-09T15:31:41 [DEBUG] FileSystem - _open: ":/modules/fs.js" QMap(("mode",
QVariant(QString, "r")))
2017-12-09T15:31:41 [DEBUG] FileSystem - _open: ":/modules/system.js" QMap(("mod
e", QVariant(QString, "r")))
2017-12-09T15:31:41 [DEBUG] FileSystem - _open: ":/modules/webpage.js" QMap(("mo
de", QVariant(QString, "r")))
2017-12-09T15:31:42 [DEBUG] WebPage - updateLoadingProgress: 10
2017-12-09T15:31:42 [DEBUG] CookieJar - Saved "CMSSESSID8694f4a4=kpca79mq05g4v0f
nh31uvkmu86; domain=dorevi.lt; path=/"
2017-12-09T15:31:42 [DEBUG] WebPage - updateLoadingProgress: 30
2017-12-09T15:31:42 [DEBUG] WebPage - updateLoadingProgress: 32
2017-12-09T15:31:42 [DEBUG] WebPage - updateLoadingProgress: 35
2017-12-09T15:31:42 [DEBUG] WebPage - updateLoadingProgress: 37
2017-12-09T15:31:42 [DEBUG] WebPage - updateLoadingProgress: 39
2017-12-09T15:31:42 [DEBUG] WebPage - updateLoadingProgress: 41
2017-12-09T15:31:42 [DEBUG] WebPage - updateLoadingProgress: 43
2017-12-09T15:31:42 [DEBUG] WebPage - setupFrame ""
2017-12-09T15:31:42 [DEBUG] WebPage - updateLoadingProgress: 46
2017-12-09T15:31:42 [DEBUG] WebPage - updateLoadingProgress: 48
2017-12-09T15:31:43 [DEBUG] WebPage - updateLoadingProgress: 52
2017-12-09T15:31:43 [DEBUG] WebPage - updateLoadingProgress: 55
2017-12-09T15:31:43 [DEBUG] WebPage - updateLoadingProgress: 58
2017-12-09T15:31:43 [DEBUG] WebPage - updateLoadingProgress: 60
2017-12-09T15:31:43 [DEBUG] WebPage - updateLoadingProgress: 63
2017-12-09T15:31:43 [DEBUG] WebPage - updateLoadingProgress: 67
2017-12-09T15:31:43 [DEBUG] WebPage - updateLoadingProgress: 69
2017-12-09T15:31:43 [DEBUG] WebPage - updateLoadingProgress: 71
2017-12-09T15:31:43 [DEBUG] WebPage - updateLoadingProgress: 74
2017-12-09T15:31:43 [DEBUG] WebPage - updateLoadingProgress: 76
2017-12-09T15:31:43 [DEBUG] WebPage - updateLoadingProgress: 78
2017-12-09T15:31:43 [DEBUG] WebPage - updateLoadingProgress: 81
2017-12-09T15:31:43 [DEBUG] WebPage - updateLoadingProgress: 83
2017-12-09T15:31:43 [DEBUG] WebPage - updateLoadingProgress: 85
2017-12-09T15:31:43 [DEBUG] WebPage - updateLoadingProgress: 87
2017-12-09T15:31:43 [DEBUG] WebPage - updateLoadingProgress: 100
2017-12-09T15:31:43 [DEBUG] CookieJar - Saved "CMSSESSID8694f4a4=kpca79mq05g4v0f
nh31uvkmu86; domain=dorevi.lt; path=/"
2017-12-09T15:31:43 [DEBUG] CookieJar - Saved "_ga=GA1.2.690650226.1512813703; e
xpires=Mon, 09-Dec-2019 10:01:43 GMT; domain=.dorevi.lt; path=/"
2017-12-09T15:31:43 [DEBUG] CookieJar - Saved "CMSSESSID8694f4a4=kpca79mq05g4v0f
nh31uvkmu86; domain=dorevi.lt; path=/"
2017-12-09T15:31:43 [DEBUG] CookieJar - Saved "_ga=GA1.2.690650226.1512813703; e
xpires=Mon, 09-Dec-2019 10:01:43 GMT; domain=.dorevi.lt; path=/"
2017-12-09T15:31:43 [DEBUG] CookieJar - Saved "_gid=GA1.2.860165508.1512813703;
expires=Sun, 10-Dec-2017 10:01:43 GMT; domain=.dorevi.lt; path=/"
2017-12-09T15:31:43 [DEBUG] CookieJar - Saved "CMSSESSID8694f4a4=kpca79mq05g4v0f
nh31uvkmu86; domain=dorevi.lt; path=/"
2017-12-09T15:31:43 [DEBUG] CookieJar - Saved "_ga=GA1.2.690650226.1512813703; e
xpires=Mon, 09-Dec-2019 10:01:43 GMT; domain=.dorevi.lt; path=/"
2017-12-09T15:31:43 [DEBUG] CookieJar - Saved "_gid=GA1.2.860165508.1512813703;
expires=Sun, 10-Dec-2017 10:01:43 GMT; domain=.dorevi.lt; path=/"
2017-12-09T15:31:43 [DEBUG] CookieJar - Saved "_gat=1; expires=Sat, 09-Dec-2017
10:02:43 GMT; domain=.dorevi.lt; path=/"
2017-12-09T15:31:43 [DEBUG] WebPage - setupFrame ""
2017-12-09T15:31:53 [DEBUG] WebPage - setupFrame ""
2017-12-09T15:31:53 [DEBUG] WebPage - updateLoadingProgress: 10
2017-12-09T15:31:53 [DEBUG] WebPage - setupFrame ""
2017-12-09T15:31:53 [DEBUG] WebPage - updateLoadingProgress: 100
2017-12-09T15:31:53 [DEBUG] WebPage - setupFrame ""
2017-12-09T15:31:53 [DEBUG] FileSystem - _open: ":/modules/fs.js" QMap(("mode",
QVariant(QString, "r")))
2017-12-09T15:31:53 [DEBUG] FileSystem - _open: ":/modules/system.js" QMap(("mod
e", QVariant(QString, "r")))
2017-12-09T15:31:53 [DEBUG] FileSystem - _open: ":/modules/webpage.js" QMap(("mo
de", QVariant(QString, "r")))
2017-12-09T15:31:53 [DEBUG] WebPage - updateLoadingProgress: 10
2017-12-09T15:31:53 [DEBUG] WebPage - setupFrame ""
2017-12-09T15:31:53 [DEBUG] FileSystem - _open: ":/modules/fs.js" QMap(("mode",
QVariant(QString, "r")))
2017-12-09T15:31:53 [DEBUG] FileSystem - _open: ":/modules/system.js" QMap(("mod
e", QVariant(QString, "r")))
2017-12-09T15:31:53 [DEBUG] FileSystem - _open: ":/modules/webpage.js" QMap(("mo
de", QVariant(QString, "r")))
2017-12-09T15:31:53 [DEBUG] WebPage - updateLoadingProgress: 100
2017-12-09T15:31:53 [DEBUG] CookieJar - Purged (session) "CMSSESSID8694f4a4=kpca
79mq05g4v0fnh31uvkmu86; domain=dorevi.lt; path=/"
2017-12-09T15:31:53 [DEBUG] CookieJar - Saved "_ga=GA1.2.690650226.1512813703; e
xpires=Mon, 09-Dec-2019 10:01:43 GMT; domain=.dorevi.lt; path=/"
2017-12-09T15:31:53 [DEBUG] CookieJar - Saved "_gid=GA1.2.860165508.1512813703;
expires=Sun, 10-Dec-2017 10:01:43 GMT; domain=.dorevi.lt; path=/"
2017-12-09T15:31:53 [DEBUG] CookieJar - Saved "_gat=1; expires=Sat, 09-Dec-2017
10:02:43 GMT; domain=.dorevi.lt; path=/"
4

1 回答 1

2

不幸的是,PhantomJS 2.1.1 确实已经过时并且被抛弃,取而代之的是无头 Chrome 的傀儡

这是其中的屏幕截图

'use strict';

const puppeteer = require('puppeteer');

(async() => {

const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.nytimes.com/');
await page.screenshot({path: 'full.png', fullPage: true});
await browser.close();

})();
于 2017-12-09T09:56:43.997 回答