1

我试图了解如何以编程方式在 IE 和 Firefox 中保存网页的“快照”?网页 URL 本质上是动态的,因此需要在触发“拍摄快照”按钮时捕获网页快照。尝试通过服务器端选项捕获站点将不起作用,因为 URL 是动态的,并且无法使用该 URL 再次访问原始页面。

理想情况下,我希望实现一个用户可以单击的书签。在点击时,当前网页被捕获并保存(即整个网页,包括网页中未显示的任何部分),而无需来自用户的任何额外交互。

然后,用户可以访问我的网站以查看网页的“静态”副本(作为 pdf 或图像——还有其他格式吗?)

如果我在我的问题中遗漏了任何其他必需的信息,我深表歉意,但我可以根据需要做出澄清。

谢谢斯里

4

1 回答 1

0

To get the html from a page with a bookmarklet use document.documentElement.outerHTML and send the code back to the server via XHR (ajax) or via a form post.

You can show that saved HTML back to user from your site, but some links and images in the saved HTML may be broken if the URLs were relative paths. To fix that, before capturing HTML, the bookmarklet should update all URLs in the page to use full paths. Even then, images will be broken in some cases when the other server blocks hot linking of images through referrer checking or through password protection. Simple referrer checking can be fixed by having your server download all images found in the saved HTML to your local server and then rewrite the paths in the saved HTML to point to the images on your local server. You would probably also want to remove any script from the page first. There is also the problem of any frames or iframes. Frames from the same domain could be captured recursively, but frames from 3rd party sites would be unreachable by the bookmarklet.

How to automatically create a PDF or image from the saved HTML page is a separate question.

In general, when saving the HTML, there is no perfect solution. The only perfect solution would be to create an add-on, extension, or program that captures a pixel perfect image of exactly what the user is seeing. This can not be done with a bookmarklet. There are probably add-ons around that do that, and to see how they do it, you can start by digging into their source.

于 2012-04-09T04:32:19.017 回答