shell - 比较两个文本文件并计算出现次数

Question

我正在尝试写一篇关于使用通用接入点名称的危险的博客文章。

因此，我进行了一些操作以获取接入点名称列表，并从 Renderlab 下载了 1000 个最常见的接入点名称列表（存在彩虹表）。

但是我如何比较这两个文本文件，以查看我收集的接入点名称中有多少对来自彩虹表的攻击开放？

文本文件是这样构建的：

收集的.txt：

linksys
internet
hotspot

最常见的接入点名称称为 SSID.txt：

default
NETGEAR
Wireless
WLAN
Belkin54g

所以脚本应该对这些行进行排序，比较它们并显示在 SSID.txt 中找到 collect.txt 中的行的次数。

这有任何意义吗？任何帮助将不胜感激:)

score 2 · Accepted Answer

如果您不介意使用 python 脚本：

file1=open('collected.txt', 'r')            # open file 1 for reading
with open('SSID.txt', 'r') as content_file: # ready file 2
    SSID = content_file.read()

found={}                                    # summary of found names
for line in file1:
    if line in SSID:
        if line not in found:
            found[line]=1
        else:
            found[line]+=1
for i in found:
    print found[i], i                       # print out list and no. of occurencies

...它可以在包含这些文件的目录中运行-collected.txt 和 SSID.txt-它将返回一个如下所示的列表：

5 NETGEAR
3 default
(...)

脚本逐行读取文件 1 并将其与整个文件 2 进行比较。可以轻松修改它以从命令提示符获取文件名。

score 0 · Accepted Answer

To find the number of times each line in file A appears in file B, you can do:

awk 'FNR==NR{a[$0]=1; next} $0 in a { count[$0]++ } 
    END { for( i in a ) print i, count[i] }' A B

If you want the output sorted, pipe the output to sort, but there's no need to sort just to find the counts. Note that the $0 in a clause can be omitted at the cost of consuming more memory, which may be a problem if file B is very large.

score 0 · Accepted Answer

首先，看一下关于 sdiff 命令的简单教程，例如How do I Compare two files under Linux or UNIX。此外，记事本++ 支持这一点。

shell - 比较两个文本文件并计算出现次数

3 回答 3

Related

Reference