header - wget 没有任何标题

Question

我想获取没有标题的文件。我尝试了很多事情，比如

wget --header="" http://xxxxx.xxxxxx.xx

如何获取没有标题的任何文件？

score -1 · Accepted Answer

'--header=header-line' 在每个 http 请求中发送 header-line 以及其余的 headers。提供的标头按原样发送，这意味着它必须包含用冒号分隔的名称和值，并且不得包含换行符。您可以通过多次指定“--header”来定义多个附加标头。
      wget --header='Accept-Charset: iso-8859-2' \
           --header='Accept-Language: hr'        \
             http://fly.srk.fer.hr/ Specification
将空字符串作为标头值将清除所有以前的用户定义标头。

从 Wget 1.10 开始，此选项可用于覆盖自动生成的标头。此示例指示 Wget 连接到 localhost，但在 Host 标头中指定“foo.bar”：
      wget --header="Host: foo.bar" http://localhost/ In versions
在 1.10 之前的 Wget 中，使用“--header”会导致发送重复的标头。

http://www.gnu.org/software/wget/manual/html_node/HTTP-Options.html

score -1 · Accepted Answer

您能否将输出分配wget给一个字符串，然后使用其他东西来处理它以删除标题（或将它们从文本中解析出来）？

例如，使用bashand grep，您可以将网页中的 html 存储为字符串，然后用于grep提取<body>部分中的文本：

w1=$(wget --quiet --output-document - www.example.com)
echo $w1 | grep --only-matching "<body>.*</body>"

它给出了下面的输出（我添加了一些换行符来改进它在这里的显示方式）：

<body> <div> 
<h1>Example Domain</h1> <p> 
This domain is established to be used for illustrative examples in documents.
You may use this domain in examples without prior coordination or asking for 
 permission.
</p> <p>
<a href="http://www.iana.org/domains/example">More information...</a></p> 
</div> </body>

header - wget 没有任何标题

2 回答 2

Related

Reference