是的,罗根乔什是对的。更好的方法是遍历 CSV 文件并找到任何键。
requested = {d[0] for d in dict1.values()}
with open('/tmp/f.csv', newline='') as csvfile:
for row in csv.reader(csvfile, delimiter='\t'):
sentence = row[1]
found = {n for n in requested if n in sentence}
for n in found:
print(f'{n}: {sentence}')
requested -= found
if not requested: # optimization, all names used
break
编辑:回答问题,而不是我的想象
EDIT2:在澄清(和一些新要求)之后......我希望我成功了。
每行仅打印句子。它不检查同一个句子是否在另一行中。您可以set()
用于保持匹配的句子并在 CVS 文件处理完毕后打印它们。
我将正则表达式用于匹配世界而不是任何子字符串。
import csv
import re
requested = {re.compile(r'\b' + re.escape(d[0]) + r'\b') for d in dict1.values()}
with open('/tmp/f.csv', newline='') as csvfile:
for row in csv.reader(csvfile, delimiter='\t'):
sentence = row[1]
found = {n for n in requested if n.search(sentence)}
if found:
requested -= found
print(sentence)
if not requested:
break
EDIT3:恢复命中名称(新要求——就像在真正的开发项目中一样:-P)
首先,您可以匹配多个名称(请参阅 参考资料len(found)
)。
在上一个示例中,您可以从已编译r'\b
的正则表达式中恢复名称(因为在名称之前和之后添加了之前):
found_names = [r.pattern[2:-2] for r in found]
但我认为这不是最好的方法。
更好的方法是将原始名称添加到requested
. 我决定使用set
. tuples
对集合的操作非常快。
requested = {(re.compile(r'\b' + re.escape(d[0]) + r'\b'), d[0])
for d in dict1.values()}
with open('/tmp/f.csv', newline='') as csvfile:
for row in csv.reader(csvfile, delimiter='\t'):
sentence = row[1]
found = {(r, n) for r, n in requested if r.search(sentence)}
if found:
found_names = tuple(n for r, n in found)
print(found_names, sentence)
requested -= found
if not requested:
break
现在找到的名称(原始d[0]
)在 list 中found_names
。您可以根据需要使用它。例如更改为字符串(替换found_name=
和打印行):
found_names = ', '.join(n for r, n in found)
print(f'{found_names}: {sentence}')