linux - 带有 Wget 的 Shell 脚本 - 如果嵌套在 for 循环中

Question

我正在尝试制作一个 shell 脚本来读取下载 URL 列表以查找它们是否仍然处于活动状态。我不确定我当前的脚本有什么问题，（我是新手），任何指针都会有很大的帮助！

user@pc:~/test# cat sites.list

http://www.google.com/images/srpr/logo3w.png
http://www.google.com/doesnt.exist
notasite

脚本：

#!/bin/bash
for i in `cat sites.list`
do
wget --spider $i -b
if grep --quiet "200 OK" wget-log; then
echo $i >> ok.txt
else
echo $i >> notok.txt
fi
rm wget-log
done

照原样，脚本将所有内容输出到 notok.txt - （第一个 google 站点应该转到 ok.txt）。但是如果我运行：

wget --spider http://www.google.com/images/srpr/logo3w.png -b

然后做：

grep "200 OK" wget-log

它greps字符串没有任何问题。我在语法上犯了什么菜鸟错误？谢谢m8s！

score 6 · Accepted Answer

-b 选项将 wget 发送到后台，因此您在 wget 完成之前执行 grep。

尝试不使用 -b 选项：

if wget --spider $i 2>&1 | grep --quiet "200 OK" ; then

score 4 · Accepted Answer

你正在做的事情有一些问题。

您for i in将遇到包含空格的行的问题。更好地用于while read读取文件的各个行。
你没有引用你的变量。如果文件中的一行（或一行中的单词）以连字符开头怎么办？然后 wget 会将其解释为一个选项。您在这里有潜在的安全风险以及错误。
创建和删除文件并不是真正必要的。如果您所做的只是检查 URL 是否可访问，则无需临时文件和额外代码即可删除它们。
wget 不一定是最好的工具。我建议curl改用。

所以这里有一个更好的方法来处理这个......

#!/bin/bash

sitelist="sites.list"
curl="/usr/bin/curl"

# Some errors, for good measure...
if [[ ! -f "$sitelist" ]]; then
  echo "ERROR: Sitelist is missing." >&2
  exit 1
elif [[ ! -s "$sitelist" ]]; then
  echo "ERROR: Sitelist is empty." >&2
  exit 1
elif [[ ! -x "$curl" ]]; then
  echo "ERROR: I can't work under these conditions." >&2
  exit 1
fi

# Allow more advanced pattern matching (for case..esac below)
shopt -s globstar

while read url; do

  # remove comments
  url=${url%%#*}

  # skip empty lines
  if [[ -z "$url" ]]; then
    continue
  fi

  # Handle just ftp, http and https.
  # We could do full URL pattern matching, but meh.
  case "$url" in
    @(f|ht)tp?(s)://*)
      # Get just the numeric HTTP response code
      http_code=$($curl -sL -w '%{http_code}' "$url" -o /dev/null)
      case "$http_code" in
        200|226)
          # You'll get a 226 in ${http_code} from a valid FTP URL.
          # If all you really care about is that the response is in the 200's,
          # you could match against "2??" instead.
          echo "$url" >> ok.txt
          ;;
        *)
          # You might want different handling for redirects (301/302).
          echo "$url" >> notok.txt
          ;;
      esac
      ;;
    *)
      # If we're here, we didn't get a URL we could read.
      echo "WARNING: invalid url: $url" >&2
      ;;
  esac

done < "$sitelist"

这是未经测试的。仅用于教育目的。可能含有坚果。

linux - 带有 Wget 的 Shell 脚本 - 如果嵌套在 for 循环中

2 回答 2

Related

Reference