linux - wget：不要跟随重定向

Question

如何防止wget跟随重定向？

score 54 · Accepted Answer

54

--max-redirect 0

我还没有尝试过，它要么不允许，要么允许无限..

于 2010-04-18T16:25:52.683 回答

score 16 · Accepted Answer

使用curlwithout-L代替wget。使用时省略该选项curl可防止重定向被跟踪。

如果您使用curl -I <URL>，那么您将获得标头而不是重定向 HTML。

如果您使用curl -IL <URL>，那么您将获得 URL 的标头，以及您被重定向到的 URL 的标头。

score 5 · Accepted Answer

5

某些版本wget有一个--max-redirect选项：请参阅此处

于 2010-04-18T16:25:54.700 回答

score 3 · Accepted Answer

默认情况下，wget 最多跟踪 20 个重定向。但是，它不跨越主机。如果您要求 wget 下载example.com，它不会触及任何资源www.example.com。wget 会将此检测为跨越到另一个主机的请求并决定反对它。

简而言之，您可能应该执行：

wget --mirror www.example.com

而不是

wget --mirror example.com

现在假设所有者www.example.com有几个子域，example.com我们对所有子域都感兴趣。如何进行？

尝试这个：

wget --mirror --domains=example.com example.com

wget 现在将访问 example.com 的所有子域，包括m.example.com和www.example.com.

score 3 · Accepted Answer

一般来说，依赖特定数量的重定向并不是一个好主意。

例如，为了下载 IntellijIdea，承诺始终解析为 Linux 社区版最新版本的 URL 类似于https://download.jetbrains.com/product?code=IIC&latest&distribution=linux，但如果您现在访问该 URL，您将在之前被重定向两次（2 次）你到达了实际的可下载文件。将来您可能会被重定向 3 次，或者根本不会被重定向。

解决这个问题的方法是使用 HTTP HEAD 动词。以下是我在 IntellijIdea 的情况下解决它的方法：

# This is the starting URL.
URL="https://download.jetbrains.com/product?code=IIC&latest&distribution=linux"
echo "URL: $URL"

# Issue HEAD requests until the actual target is found.
# The result contains the target location, among some irrelevant stuff.
LOC=$(wget --no-verbose --method=HEAD --output-file - $URL)
echo "LOC: $LOC"

# Extract the URL from the result, stripping the irrelevant stuff.
URL=$(cut "--delimiter= " --fields=4 <<< "$LOC")
echo "URL: $URL"

# Optional: download the actual file.
wget "$URL"

linux - wget：不要跟随重定向

5 回答 5

Related

Reference