python - Python - 比较存在电子邮件但在不同行中的两个电子邮件列表

Question

我正在尝试编写一些代码来比较两个列表，其中每个列表都包含电子邮件地址。但是，逐行比较不是一种选择，因为相同的电子邮件list1可以存在于list2不同的行号中。

我正在使用这种方法：

F1 = open("c:\\FILEA.txt", "r").read().split('\n')
F2 = open("c:\\FILEB.txt", "r").read().split('\n')

lines1 = filter(None, (line.rstrip() for line in sorted([n.lower() for n in F1])))
lines2 = filter(None, (line.rstrip() for line in sorted([n.lower() for n in F2])))


for i in ( i for i in lines1 if lines2[:2] == lines1[:2]):
    print i
    break

以上只是一个例子，只是逐行比较。有谁知道如何比较每封电子邮件list1并查看电子邮件是否存在于list2.

非常感谢

score 4 · Accepted Answer

如果您只是想查看一个是否在另一个中（并且不关心频率等），您可以尝试使用set's 来存储每个文件中的唯一事件，然后找到intersection两组中的这将表示两个文件中都存在的电子邮件（请注意，with带有两个文件的语句是 Python2.7+ 功能）：

>>> l1 = set()
>>> l2 = set()
>>> with open('FILEA.txt', 'rb') as f1, open('FILEB.txt', 'rb') as f2:
...     for line in f1.readlines():
...         l1.add(line.strip())
...     for line in f2.readlines():
...         l2.add(line.strip())
... 
>>> 
>>> l1
set(['another@gmail.com', 'andanother@hotmail.com', 'this@email.com'])
>>> l2
set(['unique@somehost.com', 'this@email.com', 'not@example.com'])
>>> l1 & l2
set(['this@email.com'])

使用集合，您还可以执行其他（可能）有用的操作：

识别两个集合中的项目（联合）：

>>> l1 | l2
set(['another@gmail.com', 'unique@somehost.com', 'andanother@hotmail.com', 'this@email.com', 'not@example.com'])

一组中但不在另一组中的项目（差异）：

>>> l1 - l2
set(['another@gmail.com', 'andanother@hotmail.com'])
>>> l2 - l1
set(['not@example.com', 'unique@somehost.com'])

每个集合唯一的项目（将其视为联合减去交集）（对称差异）：

>>> l1 ^ l2
set(['another@gmail.com', 'not@example.com', 'unique@somehost.com', 'andanother@hotmail.com'])

最后，您还可以使用方法而不是运算符来执行这些操作。要使用这些方法，请获取一个集合，将其中一个名称附加在上面的括号中，并将另一个设置为参数：

>>> l1.intersection(l2)
set(['this@email.com'])

我的文件看起来像这样：

文件A.txt

this@email.com
another@gmail.com
andanother@hotmail.com

文件B.txt

not@example.com
this@email.com
unique@somehost.com

python - Python - 比较存在电子邮件但在不同行中的两个电子邮件列表

1 回答 1

Related

Reference