bash - “而读 LINE 做”和 grep 问题

Question

我有两个文件。

file1.txt:  
Afghans  
Africans  
Alaskans  
...

wherefile2.txt包含网页上 wget 的输出，所以这是一个大杂烩，但确实包含第一个列表中的许多单词。

脚本：

cat file1.txt | while read LINE; do grep $LINE file2.txt; done

这没有按预期工作。我想知道为什么，所以我在循环中回显了 $LINE 变量并添加了 sleep 1，这样我就可以看到发生了什么：

cat file1.txt | while read LINE; do echo $LINE; sleep 1; grep $LINE file2.txt; done

终端中的输出看起来像这样：

阿富汗人
非洲人
阿拉斯加人阿尔巴尼亚
人美国人
grep
: 中文 : 没有这个文件或目录 :
没有
那个文件或目录..

所以你可以看到它终于找到了“亚洲”这个词。但是为什么会这样说：

没有相应的文件和目录

?

发生了什么奇怪的事情还是我在这里遗漏了什么？

score 5 · Accepted Answer

5

关于什么

grep -f file1.txt file2.txt

于 2011-04-11T21:54:44.573 回答

score 3 · Accepted Answer

@OP，首先，dos2unix按照建议使用。然后使用 awk

awk 'FNR==NR{a[$1];next}{ for(i=1;i<=NF;i++){ if($i in a) {print $i} } } '  file1 file2_wget

注意：在循环中使用 while 循环和 grep 效率不高，因为对于每次迭代，您都需要grep在 file2 上调用。

@OP，粗略解释：FNR和NR的含义请参考gawk手册。FNR==NR{a[1];next} 意味着将 file1 的内容放入 array a。当 FNR 不等于 NR 时（这意味着现在读取第二个文件），它将检查文件中的每个单词是否在 array 中a。如果是，打印出来。（for循环用于迭代每个单词）

score 2 · Accepted Answer

2

多用引号少用cat

while IFS= read -r LINE; do 
  grep "$LINE" file2.txt
done < file1.txt

于 2011-04-11T19:25:55.963 回答

score 1 · Accepted Answer

除了引用问题外，您下载的文件还包含 CRLF 行尾，这些行尾正在read脱落。用于dos2unix在迭代之前转换 file1.txt。

score 1 · Accepted Answer

虽然 usng awk 更快，但 grep 可以更轻松地生成更多细节。因此，在发出dos2unix后使用：

grep -F -i -n -f <file_containing_pattern> <file_containing_data_blob>

您将拥有所有匹配项 + 行号（不区分大小写）

至少这足以找到 file_ contains_pattern 中的所有单词：

grep -F -f <file_containing_pattern> <file_containing_data_blob>

bash - “而读 LINE 做”和 grep 问题

5 回答 5

Related

Reference