2

How do I produce:

("A","b"),("b","C"),("A",),("b",),("C",)

from

("A","b","C")

perhaps itertool can be used put I can't seem to find a function that would suit.

UPDATE

Note that ("A","C") is not in the expected output as I want the subsets to contain members that are consecutive to each other.

Another example:

subset(("A","b","C","D"))

should yield:

("A","b","C"),
("b","C","D"),
("A","b"),
("b","C"),
("C","D"),
("A",),
("b",),
("C",),
("D",)
4

3 回答 3

5

您可以使用滚动窗口配方:

def window(seq, n=2):
    """
    Returns a sliding window (of width n) over data from the sequence
    s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...
    """
    for i in range(len(seq)-n+1):
        yield tuple(seq[i:i+n])

def shrinking_window(seq):
    for i in range(len(seq)-1, 0, -1):
        yield from window(seq, i)

print(list(shrinking_window('AbC')))
# [('A', 'b'), ('b', 'C'), ('A',), ('b',), ('C',)]
print(list(shrinking_window('AbCD')))
# [('A', 'b', 'C'), ('b', 'C', 'D'), ('A', 'b'), ('b', 'C'), ('C', 'D'), ('A',), ('b',), ('C',), ('D',)]
于 2013-11-09T10:06:04.420 回答
2

这是一种方法:

input_tuple = ("A", "b", "C", "d")
output_tuples = []
for subtuple_length in reversed(xrange(1, len(input_tuple))):
    for start_index in xrange(0, (len(input_tuple) + 1 - subtuple_length)):
        output_tuples.append(input_tuple[start_index:start_index + subtuple_length])

这构建了一个连续子元组的列表 - 你也可以将print它们,或者yield它们作为生成器或其他任何东西。它是输入元组长度的二次方,但是您的预期结果集的大小也是如此,所以我不确定是否有解决方法。

于 2013-11-09T10:25:05.693 回答
2
subset(("A","b","C","D"))

应该产生:

("A","b","C"), 
("b","C","D"),
("A","b"),
("b","C"),
("C","D"),
("A",),
("b",),
("C",),
("D",)

滑动窗户可能很难。迭代缩小或增长的窗口是双重的。

首先列出要解决的步骤,然后创建一个遵循这些步骤的函数:

  1. 从最大的窗口大小开始(比总长度小一,来自示例代码)。
  2. 然后计算覆盖数据集所需的窗口数。
  3. 然后对于每个窗口,您可以重复使用该数字作为起始索引,并且您需要将起始索引添加到窗口大小以确定每个窗口停止的位置:

结果函数:

def subset(data):
    total_length = len(data)
    for window_length in range(total_length - 1, 0, -1): # biggest first
        n_windows = total_length - window_length + 1
        for each_window in range(n_windows):
            start = each_window
            stop = start + window_length
            yield data[start:stop]

样本数据:

data = ("A","b","C","D")

现在,调用subsetondata返回一个生成器,如果我们传递给list,它会具体化结果:

>>> subset(data)
<generator object subset at 0x7fbc3d7f3570>
>>> list(subset(data))
[('A', 'b', 'C'), ('b', 'C', 'D'), ('A', 'b'), ('b', 'C'), ('C', 'D'), ('A',), ('b',), ('C',), ('D',)]

双端队列解决方案:

我对使用双端队列(来自集合模块)作为滚动窗口的想法很着迷,并决定演示一下:

import collections
import pprint

def shrinking_windows(iterable):
    '''
    Given an ordered iterable (meaningless for unordered ones)
    return a list of tuples representing each possible set
    of consecutive items from the original list. e.g.
    shrinking_windows(['A', 'b', 'c']) returns 
    [('A', 'b', 'c'), ('A', 'b'), ('b', 'c') ...] but not ('A', 'c')
    '''
    window_generator = range(len(iterable), 0, -1)
    results = []
    for window in window_generator:
        d = collections.deque((), maxlen=window)
        for i in iterable:
            d.append(i)
            if len(d) == window:
                results.append(tuple(d))
    return results

pprint.pprint(shrinking_windows('AbCd'))

很好地返回:

[('A', 'b', 'C', 'd'),
 ('A', 'b', 'C'),
 ('b', 'C', 'd'),
 ('A', 'b'),
 ('b', 'C'),
 ('C', 'd'),
 ('A',),
 ('b',),
 ('C',),
 ('d',)]
于 2013-11-09T13:23:35.937 回答