0

所以我有一个关键词列表,我正在尝试检查是否在我的 csv 表的一行中找到了这些词,如果存在,它应该被标记。我的代码可以完美运行,除非该行包含多个关键字,否则不会被标记。想法?

import sys
import csv
nk = ('aaa','bbb','ccc')
with open(sys.argv[1], "rb") as f:
    reader = csv.reader(f, delimiter = '\t')
    for row in reader:
        string=str(row)
        if any(word in string for word in nk):
            row.append('***')
            print '\t'.join(row)
        else:
            print '\t'.join(row)

提前致谢!

4

1 回答 1

0

使用集合交集得到所有常用词:

nk = {'aaa','bbb','ccc'}
seen = set()             #keep as track of items seen so far in this set
with open(sys.argv[1], "rb") as f:
    ...
    for row in reader:
        #update `seen` with the items found common between `nk` and the current `row`
        seen.update(nk.intersection(row))
    ...

不要转换row为字符串(string=str(row)),in运算符也适用于列表,并且行为与字符串不同in

>>> strs = "['foo','abarc']"
>>> 'bar' in strs            #substring search
True
>>> lis = ['foo','abarc']    #item search
>>> 'bar' in lis
False
于 2013-09-06T14:51:17.823 回答