regex - 如何告诉 wget 下载文本文件（在这种情况下），其中包含文本文件中间的特定字符串

Question

我正在上一门软件开发课程，正在尝试在所有软件开发中练习“DRY”原则，所以为了练习，我想让 wget 下载其中的所有文件（http://fusionplant.com/ archive/textfiles/) 目录，其中包含“冒犯性”一词。

这是其中之一的示例：http: //fusionplant.com/archive/textfiles/gnu_fortune/gnu_fortune_offensive_astrology

有什么方法可以做到这一点吗？我想他们会使用正则表达式，但我在网上找不到任何足够可比的例子来完成它。

这是我尝试使用的命令，它是错误的。甚至没有接近，但这里是：

    wget -A '*offensive*.txt' http://fusionplant.com/archive/textfiles/gnu_fortune

它没有返回错误信息，只是下载了索引文件

wget -A '*offensive*.txt' http://fusionplant.com/archive/textfiles/gnu_fortune
--2012-06-15 11:15:07--  http://fusionplant.com/archive/textfiles/gnu_fortune
Resolving fusionplant.com... 216.254.119.231
Connecting to fusionplant.com|216.254.119.231|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://fusionplant.com/archive/textfiles/gnu_fortune/ [following]
--2012-06-15 11:15:07--  http://fusionplant.com/archive/textfiles/gnu_fortune/
Reusing existing connection to fusionplant.com:80.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: “gnu_fortune”

[  <=>                                  ] 14,576      50.4K/s   in 0.3s    

2012-06-15 11:15:08 (50.4 KB/s) - “gnu_fortune” saved [14576]

score 0 · Accepted Answer

你不能这样做。您必须下载文件，然后检查文件是否包含字符串。您不能向服务器发送请求，让它为您执行此操作。

regex - 如何告诉 wget 下载文本文件（在这种情况下），其中包含文本文件中间的特定字符串

1 回答 1

Related

Reference