4

I was trying to find a way of using wget to log a the list of redirected website URLs into one file. For example:

www.website.com/1234 now redirects to www.newsite.com/a2as4sdf6nonsense

and

www.website.com/1235 now redirects to www.newsite.com/ab6haq7ah8nonsense

Wget does output the redirect, but doesn't log the new location. I get this in the terminal:

HTTP request sent, awaiting response...301 moved permanently
Location: http.www.newsite.com/a2as4sdf6 

...

I would just like to capture that new URL to a file.

I was using something like this:

    for i in `seq 1 9999`; do
        wget http://www.website.com/$i -O output.txt
    done

But this outputs the sourcecode of each webpage to that file. I am trying to just retrieve only the redirect info. Also, I would like to add a new line to the same output file each time it retrieves a new URL.

I would like the output to look something like:

    www.website.com/1234 www.newsite.com/a2as4sdf6nonsense
    www.website.com/1235 www.newsite.com/ab6haq7ah8nonsense

...

4

1 回答 1

2

这不是一个完美的解决方案,但它有效:

wget http://tinyurl.com/2tx --server-response -O /dev/null 2>&1 |\
   awk '(NR==1){SRC=$3;} /^  Location: /{DEST=$2} END{ print SRC, DEST}'

wget不是一个完美的工具。curl会好一点。

它是这样工作的:我们得到 url,但我们将所有输出(页面内容)重定向到 /dev/null。我们要求服务器响应 http 标头(以获取 Loaction 标头),然后将其传递给 awk。请注意,可能有多个重定向。我以为你想要最后一个。awk 从第一行 (NR==1) 获取您要求的 URL,并从每个 Location 标头获取目标 URL。最后,我们根据您的需要打印 SRC 和 DESC。

于 2012-08-09T08:00:59.987 回答