编辑:下面代码的主要问题之一是由于将正则表达式对象存储在字典中,以及如何访问它们以查看它们是否可以匹配另一个字符串。但我仍然会留下我之前的问题,因为我认为可能有一种简单的方法可以完成所有这些工作。
我想在 python 中找到一个方法,它知道如何返回两个字符串是否引用同一事物的布尔值。我知道这很困难,如果在编程中不是完全荒谬的话,但我正在研究使用引用同一事物的替代字符串字典来处理这个问题。
这里有一些例子,因为我知道没有它们就没有多大意义。
如果我给出字符串:
'breakingBad.Season+01 Episode..02'
然后我希望它匹配字符串:
'Breaking Bad S01E02'
或者'three.BuCkets+of H2O'
可以匹配'3 buckets of water'
我知道对于等同义词而言这几乎是不可能的'3'
,'water'
但如果需要,我愿意将这些作为相关正则表达式同义词的字典提供给该函数。
我有一种感觉,在 python 中有一种更简单的方法可以做到这一点,就像往常一样,但这是我到目前为止所拥有的:
import re
def check_if_match(given_string, string_to_match, alternative_dictionary):
print 'matching: ', given_string, ' against: ', string_to_match
# split the string into it's parts with pretty much any special character
list_of_given_strings = re.split(' |\+|\.|;|,|\*|\n', given_string)
print 'List of words retrieved from given string: '
print list_of_given_strings
check = False
counter = 0
for i in range(len(list_of_given_strings)):
m = re.search(list_of_given_strings[i], string_to_match, re.IGNORECASE)
m_alt = None
try:
m_alt = re.search(alternative_dictionary[list_of_given_strings[i]], string_to_match, re.IGNORECASE)
except KeyError:
pass
if m or m_alt:
if counter == len(list_of_given_strings)-1: check = True
else: counter += 1
print list_of_given_strings[i], ' found to match'
else:
print list_of_given_strings[i], ' did not match'
break
return check
string1 = 'breaking Bad.Season+01 Episode..02'
other_string_to_check = 'Breaking.Bad.S01+E01'
# make a dictionary of synonyms - here we should be saying that "S01" is equivalent to "Season 01"
alternative_dict = {re.compile(r'S[0-9]',flags=re.IGNORECASE):re.compile(r'Season [0-9]',flags=re.IGNORECASE),\
re.compile(r'E[0-9]',flags=re.IGNORECASE):re.compile(r'Episode [0-9]',flags=re.IGNORECASE)}
print check_if_match(string1, other_string_to_check, alternative_dict)
print
# another try
string2 = 'three.BuCkets+of H2O'
other_string_to_check2 = '3 buckets of water'
alternative_dict2 = {'H2O':'water', 'three':'3'}
print check_if_match(string2, other_string_to_check2, alternative_dict2)
这将返回:
matching: breaking Bad.Season+01 Episode..02 against: Breaking.Bad.S01+E01
List of words retrieved from given string:
['breaking', 'Bad', 'Season', '01', 'Episode', '', '02']
breaking found to match
Bad found to match
Season did not match
False
matching: three.BuCkets+of H2O against: 3 buckets of water
List of words retrieved from given string:
['three', 'BuCkets', 'of', 'H2O']
three found to match
BuCkets found to match
of found to match
H2O found to match
True
我意识到这可能意味着我的字典键和值有问题,但我觉得我离一个可能已经创建的简单 pythonic 解决方案越来越远了。
有人有什么想法吗?