19

I started a wget mirror with "wget --mirror [sitename]", and it was working fine, but accidentally interrupted the process.

I now want to resume the mirror with the following caveats:

  • If wget has already downloaded a file, I don't want it downloaded it again. I don't even want wget to check the timestamp: I know the version I have is "recent enough".

  • I do want wget to read the files it's already downloaded and follow links inside those files.

I can use "-nc" for the first point above, but I can't seem to coerce wget to read through files it's already downloaded.

Things I've tried:

  • The obvious "wget -c -m" doesn't work, because it wants to compare timestamps, which requires making at least a HEAD request to the remote server.

  • "wget -nc -m" doesn't work, since -m implies -N, and -nc is incompatible with -N.

  • "wget -F -nc -r -l inf" is the best I could come up with, but it still fails. I was hoping "-F" would coerce wget into reading local, already-downloaded files as HTML, and thus follow links, but this doesn't appear to happen.

  • I tried a few other options (like "-c" and "-B [sitename]"), but nothing works.

How do I get wget to resume this mirror?

4

2 回答 2

11

显然这有效:

已解决:Wget 错误“不能同时标记和破坏旧文件。” 发布于 2012 年 2 月 4 日 在尝试恢复通过 Wget 运行的站点镜像操作时,我遇到了错误“不能同时时间戳和破坏旧文件”。事实证明,同时设置 -N 和 -nc 标志运行 Wget 是不可能的,所以如果你想用 noclobber 恢复递归下载,你必须禁用 -N。-m 属性(用于镜像)本质上设置了 -N 属性,因此您必须从 -m 切换到 -r 才能使用 noclobber。

来自: http: //www.marathon-studios.com/blog/solved-wget-error-cant-timestamp-and-not-clobber-old-files-at-the-same-time/

于 2014-01-24T15:15:49.210 回答
6

-m,根据wget手册相当于这个更长的系列设置-r -N -l inf --no-remove-listing:只需使用这些设置而不是-m,并且不使用-N(时间戳)。

现在我不确定是否有办法让 wget 从现有的 html 文件中下载 url。可能有一个解决方案,我知道它可以将 html 文件作为输入并抓取其中的所有链接。也许您可以使用 bash 命令将所有 html 文件连接到一个大文件中。

我通过删除所有 html 文件解决了这个问题,因为我不介意只重新下载它们。但这可能不适用于每个人的用例。

于 2016-09-17T10:07:22.330 回答