6

我正在尝试探索 Python 内置函数的功能。我目前正在尝试处理一些需要字符串的东西,例如:

'the fast dog'

并将字符串分解为所有可能的有序短语,如列表。上面的示例将输出如下:

[['the', 'fast dog'], ['the fast', 'dog'], ['the', 'fast', 'dog']]

关键是在生成可能的短语时需要保留字符串中单词的原始顺序。

我已经能够得到一个可以做到这一点的函数,但它相当麻烦和丑陋。但是,我想知道 Python 中的某些内置功能是否有用。我在想有可能在各种空白处拆分字符串,然后将其递归地应用于每个拆分。可能有人有一些建议吗?

4

4 回答 4

10

使用itertools.combinations

import itertools

def break_down(text):
    words = text.split()
    ns = range(1, len(words)) # n = 1..(n-1)
    for n in ns: # split into 2, 3, 4, ..., n parts.
        for idxs in itertools.combinations(ns, n):
            yield [' '.join(words[i:j]) for i, j in zip((0,) + idxs, idxs + (None,))]

例子:

>>> for x in break_down('the fast dog'):
...     print(x)
...
['the', 'fast dog']
['the fast', 'dog']
['the', 'fast', 'dog']

>>> for x in break_down('the really fast dog'):
...     print(x)
...
['the', 'really fast dog']
['the really', 'fast dog']
['the really fast', 'dog']
['the', 'really', 'fast dog']
['the', 'really fast', 'dog']
['the really', 'fast', 'dog']
['the', 'really', 'fast', 'dog']
于 2013-08-23T15:51:04.997 回答
4

想想单词之间的间隔。该集合的每个子集都对应于一组分割点,最后对应于短语的分割:

the fast dog jumps
   ^1   ^2  ^3     - these are split points

例如,子集{1,3}对应于拆分{"the", "fast dog", "jumps"}

子集可以枚举为从 1 到 2^(L-1)-1 的二进制数,其中 L 是字数

001 -> the fast dog, jumps
010 -> the fast, dog jumps
011 -> the fast, dog, jumps
etc.
于 2013-08-23T15:57:11.160 回答
3

我将详细说明@grep 的解决方案,同时仅使用您在问题中所述的内置插件并避免递归。您可以按照以下方式以某种方式实现他的答案:

#! /usr/bin/python3

def partition (phrase):
    words = phrase.split () #split your phrase into words
    gaps = len (words) - 1 #one gap less than words (fencepost problem)
    for i in range (1 << gaps): #the 2^n possible partitions
        r = words [:1] #The result starts with the first word
        for word in words [1:]:
            if i & 1: r.append (word) #If "1" split at the gap
            else: r [-1] += ' ' + word #If "0", don't split at the gap
            i >>= 1 #Next 0 or 1 indicating split or don't split
        yield r #cough up r

for part in partition ('The really fast dog.'):
    print (part)
于 2013-08-23T17:22:54.407 回答
1

您请求的操作通常称为“分区”,它可以在任何类型的列表上完成。所以,让我们实现任何列表的分区:

def partition(lst):
    for i in xrange(1, len(lst)):
        for r in partition(lst[i:]):
            yield [lst[:i]] + r
    yield [lst]

请注意,较长的列表会有很多分区,因此最好将其实现为生成器。要检查它是否有效,请尝试:

print list(partition([1, 2, 3]))

现在,您想使用单词作为元素来划分字符串。执行此操作的最简单方法是按单词拆分文本,运行原始分区算法,然后将单词组重新合并为字符串:

def word_partition(text):
    for p in partition(text.split()):
        yield [' '.join(group) for group in p]

同样,要对其进行测试,请使用:

print list(word_partition('the fast dog'))
于 2013-08-23T16:01:00.240 回答