2

A friend accidentally deleted his forum database. Which wouldn't normally be a huge issue, except for the fact that he neglected to perform backups. 2 years of content is just plain gone. Obviously, he's learned his lesson.

The good news, however, is that Google keeps backups, even if individual site owners are idiots. The bad news is, that traditional crawling robots would choke on the Google Cache version of the website.

Is there anything existing that would help trawl the Google Cache, or how would I go about rolling my own?

4

2 回答 2

4

You may want to consider looking at crawling the archive.org cache as well. If you're in there, it's generally better structured.

于 2008-12-16T19:17:42.317 回答
0

如果网站足够小以至于您可以手动抓取它,那么这个无缝导航 Google 缓存的用户脚本非常有用。

于 2013-12-18T02:15:27.083 回答