download - 如何避免使用 wget 下载链接

Question

我正在尝试下载以下站点http://computerone.altervista.org的某些页面，仅用于测试...</p>

我的目标是只下载匹配以下模式“ *JavaScript*”和“ *index*”的页面。

实际上，如果我尝试以下选项

wget \
-A "*Javascript*, *index*" \
--exclude-domains http://computerone.altervista.org/rss-articles/ \
-e robots=off \
--mirror -E -k -p -np -nc --convert-links  \
--wait=5 -c  \
http://computerone.altervista.org

它的工作原理期望它也尝试下载http://computerone.altervista.org/rss-articles/。

我的问题是：

为什么它试图下载http://computerone.altervista.org/rss-articles/页面？
我应该如何避免它？我尝试--exclude-domains http://computerone.altervista.org/rss-articles/了选项，但它尝试下载它

PS：
查看我得到的源页面：

<link rel="alternate" type="application/rss+xml" title="RSS 2.0" href="rss-articles/" />

score 2 · Accepted Answer

wget -p下载所有页面要求：

男人 wget：

为了结束这个话题，值得知道 Wget 的外部文档链接的想法是在<A>标签、 <AREA>标签或<LINK>除<LINK REL="stylesheet">.

排除rss-articles使用-X或--exclude-directories

wget -A "*Javascript*, *index*" -X "rss-articles" -e robots=off --mirror -E -k -p -np -nc -c http://computerone.altervista.org

download - 如何避免使用 wget 下载链接

1 回答 1

Related

Reference