我正在做一个项目,但没有找到任何有用的资源来说明如何将带有多个单词的子字符串与字符串匹配。
例如:
substring = "I can be found in this string"
在
string = "Now, I can be found in this string example"
我不能使用该.find()
方法或正则表达式并使事情变得更复杂,边缘情况包括:
"reflexion mirror"
不匹配"'reflexion mirror'"
但匹配"(reflexion mirror)"
"maley"
不匹配"o'maley"
"luminate"
火柴"'''luminate"
"luminate"
火柴"luminate__"
"george
" 不匹配"georges"
每当字符在字符串中加上"__hello world__"
or"''hello world''"
时,它都不会干扰匹配"hello"
or"world"
我正在使用 Boyer Moore 来查找除了这些看似冲突的边缘情况之外的有效子字符串。哦,是的,我也忘了提到这个解决方案应该强调时间复杂度的性能。
我word.translate({ord(c): None for c in string.whitespace}).lower()
用来预处理我的字符串和子字符串,结果是这样的:
"asuggestionboxentryfrombobcarterdearanonymous,i'mnotquitesureiunderstandtheconceptofthis'anonymous'suggestionbox.ifnoonereadswhatwewrite,thenhowwillanythingeverchangebutinthespiritofgoodwill,i'vedecidedtooffermytwocents,andhopefullykevinwon'tstealit(ha,ha).iwouldreallyliketoseemorevarietiesofcoffeeinthecoffeemachineinthebreakroom.'milkandsugar','blackwithsugar','extrasugar'and'creamandsugar'don'toffermuchdiversity.also,theselectionofdrinksseemsheavilyweightedinfavorof'sugar'.whatifwedon'twantanysugar?"
关于如何解释这些边缘情况的任何想法?
谢谢
编辑
有一个警告'
要被视为一个角色
这是我从中收集边缘案例的单元测试:
class TestCountoccurrencesInText(unittest.TestCase):
def test_count_occurrences_in_text(self):
"""
Test the count_occurrences_in_text function
"""
text = """Georges is my name and I like python. Oh ! your name is georges? And you like Python!
Yes is is true, I like PYTHON
and my name is GEORGES"""
# test with a little text.
self.assertEqual(3, count_occurrences_in_text("Georges", text))
self.assertEqual(3, count_occurrences_in_text("GEORGES", text))
self.assertEqual(3, count_occurrences_in_text("georges", text))
self.assertEqual(0, count_occurrences_in_text("george", text))
self.assertEqual(3, count_occurrences_in_text("python", text))
self.assertEqual(3, count_occurrences_in_text("PYTHON", text))
self.assertEqual(2, count_occurrences_in_text("I", text))
self.assertEqual(0, count_occurrences_in_text("n", text))
self.assertEqual(0, count_occurrences_in_text("reflexion mirror", "I am a senior citizen and I live in the Fun-Plex 'Reflexion Mirror' in Sopchoppy, Florida"))
self.assertEqual(1, count_occurrences_in_text("Linguist", "'''Linguist Specialist Found Dead on Laboratory Floor'''"))