我在 .txt 文件中有很长的单词和正则表达式模式列表,我是这样读入的:
with open(fileName, "r") as f1:
pattern_list = f1.read().split('\n')
为了说明,前七个看起来像这样:
print pattern_list[:7]
# ['abandon*', 'abuse*', 'abusi*', 'aching', 'advers*', 'afraid', 'aggress*']
我想知道每当我将输入字符串中的单词与 pattern_list 中的任何单词/模式匹配时。以下类型的作品,但我看到两个问题:
- 首先,每次我检查新的 string_input 时 re.compile() 我的 pattern_list 中的每个项目似乎效率都很低...但是当我尝试将 re.compile(raw_str) 对象存储在列表中时(然后能够将已经编译的正则表达式列表重用于类似的东西
if w in regex_compile_list:
,它不能正常工作。) - 其次,它有时不像我预期的那样工作 - 注意如何
- 滥用*匹配滥用
- abusi* 与被虐待和虐待相匹配
- ache* 与疼痛相匹配
我做错了什么,我怎样才能更有效率?提前感谢您对菜鸟的耐心,并感谢您的任何见解!
string_input = "People who have been abandoned or abused will often be afraid of adversarial, abusive, or aggressive behavior. They are aching to abandon the abuse and aggression."
for raw_str in pattern_list:
pat = re.compile(raw_str)
for w in string_input.split():
if pat.match(w):
print "matched:", raw_str, "with:", w
#matched: abandon* with: abandoned
#matched: abandon* with: abandon
#matched: abuse* with: abused
#matched: abuse* with: abusive,
#matched: abuse* with: abuse
#matched: abusi* with: abused
#matched: abusi* with: abusive,
#matched: abusi* with: abuse
#matched: ache* with: aching
#matched: aching with: aching
#matched: advers* with: adversarial,
#matched: afraid with: afraid
#matched: aggress* with: aggressive
#matched: aggress* with: aggression.