这只是我写的一些快速代码,我认为可以很好地从一个片段中提取单词,就像你给出的那个......它没有完全考虑过,但我认为如果你不能,这些方面的东西会起作用查找预包装类型的解决方案
textstring = "likewesaid, we'lldowhatwecan. Trytoreconnectyou, towhatyouwant," said the Sheep Man. "Butwecan'tdoit-alone. Yougottaworktoo."
indiv_characters = list(textstring) #splits string into individual characters
teststring = ''
sequential_indiv_word_list = []
for cur_char in indiv_characters:
teststring = teststring + cur_char
# do some action here to test the testsring against an English dictionary where you can API into it to get True / False if it exists as an entry
if in_english_dict == True:
sequential_indiv_word_list.append(teststring)
teststring = ''
#at the end just assemble a sentence from the pieces of sequential_indiv_word_list by putting a space between each word
还有一些问题需要解决,例如如果它永远不会返回匹配项,这显然不起作用,因为如果它只是不断添加更多字符,它将永远不会匹配,但是由于您的演示字符串有一些空格,您可以拥有它也可以识别这些并自动从每个开始重新开始。
您还需要考虑标点符号,编写条件,例如
if cur_char == ',' or cur_char =='.':
#do action to start new "word" automatically