python - 使用 Python 解析字符串？

Question

给定一个字符串，例如 'helloyellowellow'，解析给定字符串中的所有有效字符串。（例如：[[hell,hello,yellow],[low,low]........]

我正在寻找编写代码的最优化方式。这是我的，但我不确定这是否是最好的方法。

完全披露 - 这是一个面试问题

master = []

#   Dictionary for us to look up words   
def is_word(inputstr):
    #returns True/False


def processstring(fstr,secstr,li):
    if is_word(fstr): 
        li.append(fstr)
    if len(secstr) == 0:
        if len(li) != 0:
            master.append(li)
        return
    processstring(fstr+secstr[0], secstr[1:len(secstr)],li)



def wrapperprocess(inpstr):
    li = []
    if len(inpstr) == 0:
        return
    processstring('',inpstr,li)
    wrapperprocess(inpstr[1:len(inpstr)])


wrapperprocess('helloyellowellow')
print master

score 3 · Accepted Answer

既然您提到您正在寻找一种有效的算法，并且假设您提前获得了字典（而不仅仅是作为可调用谓词），您可以使用Aho–Corasick算法。

当然，如果输入文本很短，更简单的算法会更快，以避免对字典进行“昂贵”的预处理。

另外，另一种python-answer：这是一种简单的检查每个子字符串的简单方法：

def gen_words(txt):
    n = len(txt)
    for i in range(n):
        for j in range(i+1, n+1):
            subtxt = txt[i:j]
            if is_word(subtxt):
                yield subtxt

为了唯一性，请执行以下操作：

all_words = set(gen_words(txt))

score 2 · Accepted Answer

您可以执行以下操作：

tgt='helloyellowellow'

with open('/usr/share/dict/words') as f:
    for word in f:
        word=word.strip()
        if word in tgt and len(word)>1:
            print word

印刷：

el
ell
he
hell
hello
lo
low
loy
ow
owe
we
well
ye
yell
yellow

如果您只是在寻找is_word未定义的函数，则可以使用以下内容：

def is_word(word, dic='/usr/share/dict/words'):
    if not hasattr(is_word, 'words'):
        with open(dic) as f:
            is_word.words={word.strip() for word in f}

    return word in is_word.words and len(word)>1

作为默认数据结构，Python 集的平均查找时间为 O(1)。您不太可能自己编写更快的东西。

score 0 · Accepted Answer

很好解决的问题，

使用Wordnet包，

在解析您的给定字符串时，从某个索引开始，并为索引上的每个增量不断折磨您的索引值，使用 wordnet 检查是否存在相同的单词，它会告诉您特定的子字符串是否有意义！

要安装wordnet：

https://pypi.python.org/pypi/Wordnet-bn/1.0

python - 使用 Python 解析字符串？

3 回答 3

Related

Reference