python - 将字符串拆分为所有可能的有序短语

Question

我正在尝试探索 Python 内置函数的功能。我目前正在尝试处理一些需要字符串的东西，例如：

'the fast dog'

并将字符串分解为所有可能的有序短语，如列表。上面的示例将输出如下：

[['the', 'fast dog'], ['the fast', 'dog'], ['the', 'fast', 'dog']]

关键是在生成可能的短语时需要保留字符串中单词的原始顺序。

我已经能够得到一个可以做到这一点的函数，但它相当麻烦和丑陋。但是，我想知道 Python 中的某些内置功能是否有用。我在想有可能在各种空白处拆分字符串，然后将其递归地应用于每个拆分。可能有人有一些建议吗？

score 10 · Accepted Answer

使用itertools.combinations：

import itertools

def break_down(text):
    words = text.split()
    ns = range(1, len(words)) # n = 1..(n-1)
    for n in ns: # split into 2, 3, 4, ..., n parts.
        for idxs in itertools.combinations(ns, n):
            yield [' '.join(words[i:j]) for i, j in zip((0,) + idxs, idxs + (None,))]

例子：

>>> for x in break_down('the fast dog'):
...     print(x)
...
['the', 'fast dog']
['the fast', 'dog']
['the', 'fast', 'dog']

>>> for x in break_down('the really fast dog'):
...     print(x)
...
['the', 'really fast dog']
['the really', 'fast dog']
['the really fast', 'dog']
['the', 'really', 'fast dog']
['the', 'really fast', 'dog']
['the really', 'fast', 'dog']
['the', 'really', 'fast', 'dog']

score 4 · Accepted Answer

想想单词之间的间隔。该集合的每个子集都对应于一组分割点，最后对应于短语的分割：

the fast dog jumps
   ^1   ^2  ^3     - these are split points

例如，子集{1,3}对应于拆分{"the", "fast dog", "jumps"}

子集可以枚举为从 1 到 2^(L-1)-1 的二进制数，其中 L 是字数

001 -> the fast dog, jumps
010 -> the fast, dog jumps
011 -> the fast, dog, jumps
etc.

score 3 · Accepted Answer

我将详细说明@grep 的解决方案，同时仅使用您在问题中所述的内置插件并避免递归。您可以按照以下方式以某种方式实现他的答案：

#! /usr/bin/python3

def partition (phrase):
    words = phrase.split () #split your phrase into words
    gaps = len (words) - 1 #one gap less than words (fencepost problem)
    for i in range (1 << gaps): #the 2^n possible partitions
        r = words [:1] #The result starts with the first word
        for word in words [1:]:
            if i & 1: r.append (word) #If "1" split at the gap
            else: r [-1] += ' ' + word #If "0", don't split at the gap
            i >>= 1 #Next 0 or 1 indicating split or don't split
        yield r #cough up r

for part in partition ('The really fast dog.'):
    print (part)

score 1 · Accepted Answer

您请求的操作通常称为“分区”，它可以在任何类型的列表上完成。所以，让我们实现任何列表的分区：

def partition(lst):
    for i in xrange(1, len(lst)):
        for r in partition(lst[i:]):
            yield [lst[:i]] + r
    yield [lst]

请注意，较长的列表会有很多分区，因此最好将其实现为生成器。要检查它是否有效，请尝试：

print list(partition([1, 2, 3]))

现在，您想使用单词作为元素来划分字符串。执行此操作的最简单方法是按单词拆分文本，运行原始分区算法，然后将单词组重新合并为字符串：

def word_partition(text):
    for p in partition(text.split()):
        yield [' '.join(group) for group in p]

同样，要对其进行测试，请使用：

print list(word_partition('the fast dog'))

python - 将字符串拆分为所有可能的有序短语

4 回答 4

Related

Reference