python - 如何检查字典中的项目是否存在于 CSV 文件中？

Question

我有一本字典和一个 CSV 文件（实际上是制表符分隔的）：

dict1：

{1 : ['Charles', 22],
2: ['James', 36],
3: ['John', 18]}

data.csv：


[ 22 | Charles goes to the cinema | Activity    ]
[ 46 | John is a butcher          | Profession  ]
[ 95 | Charles is a firefighter   | Profession  ]
[ 67 | James goes to the zoo      | Activity    ]

我想在dict1's 的值的第一项中获取字符串（名称）并在 csv 的第二列中搜索它。如果名字出现在句子中，我想打印第一个（并且只有第一个）句子。

但是我在搜索时遇到问题 - 如何在迭代时访问列/行数据dict1？我尝试过这样的事情：

with open('data.csv', 'r', encoding='utf-8') as file:
    reader = csv.reader(file, delimiter='\t')
    for (id, (name, age)) in dict1.items():
        if name in reader.row[1] # reader.row[1] is wrong!!!
        print(reader.row[1])

score 1 · Accepted Answer

是的，罗根乔什是对的。更好的方法是遍历 CSV 文件并找到任何键。

requested = {d[0] for d in dict1.values()}
with open('/tmp/f.csv', newline='') as csvfile:
    for row in csv.reader(csvfile, delimiter='\t'):
        sentence = row[1]
        found = {n for n in requested if n in sentence}
        for n in found:
            print(f'{n}: {sentence}')
        requested -= found
        if not requested:  # optimization, all names used
            break

编辑：回答问题，而不是我的想象

EDIT2：在澄清（和一些新要求）之后......我希望我成功了。

每行仅打印句子。它不检查同一个句子是否在另一行中。您可以set()用于保持匹配的句子并在 CVS 文件处理完毕后打印它们。

我将正则表达式用于匹配世界而不是任何子字符串。

import csv
import re

requested = {re.compile(r'\b' + re.escape(d[0]) + r'\b') for d in dict1.values()}
with open('/tmp/f.csv', newline='') as csvfile:
    for row in csv.reader(csvfile, delimiter='\t'):
        sentence = row[1]
        found = {n for n in requested if n.search(sentence)}
        if found:
            requested -= found
            print(sentence)
        if not requested:
            break

EDIT3：恢复命中名称（新要求——就像在真正的开发项目中一样：-P）

首先，您可以匹配多个名称（请参阅参考资料len(found)）。

在上一个示例中，您可以从已编译r'\b的正则表达式中恢复名称（因为在名称之前和之后添加了之前）：

found_names = [r.pattern[2:-2] for r in found]

但我认为这不是最好的方法。

更好的方法是将原始名称添加到requested. 我决定使用set. tuples对集合的操作非常快。

requested = {(re.compile(r'\b' + re.escape(d[0]) + r'\b'), d[0])
             for d in dict1.values()}
with open('/tmp/f.csv', newline='') as csvfile:
    for row in csv.reader(csvfile, delimiter='\t'):
        sentence = row[1]
        found = {(r, n) for r, n in requested if r.search(sentence)}
        if found:
            found_names = tuple(n for r, n in found)
            print(found_names, sentence)
            requested -= found
        if not requested:
            break

现在找到的名称（原始d[0]）在 list 中found_names。您可以根据需要使用它。例如更改为字符串（替换found_name=和打印行）：

found_names = ', '.join(n for r, n in found)
print(f'{found_names}: {sentence}')

python - 如何检查字典中的项目是否存在于 CSV 文件中？

1 回答 1

Related

Reference