python-2.7 - 反转（或简化）笛卡尔积？

Question

为了让事情变得更简单但也更复杂，我尝试实现“组合/简洁标签”的概念，该概念进一步扩展到多种基本标签形式。

在这种情况下，标签由（一个或多个）“子标签”组成，用分号分隔：

food:fruit:apple:sour/sweet

drink:coffee/tea:hot/cold

wall/bike:painted:red/blue

斜线表示“子标签”可互换性。因此，解释器将它们翻译成这样：

food:fruit:apple:sour
food:fruit:apple:sweet

drink:coffee:hot
drink:coffee:cold
drink:tea:hot
drink:tea:cold

wall:painted:red
wall:painted:blue
bike:painted:red
bike:painted:blue

使用的代码（不完美，但有效）：

import itertools

def slash_split_tag(tag):
    if not '/' in tag:
        return tag
    subtags = tag.split(':')
    pattern, v_pattern = (), ()
    for subtag in subtags:
        if '/' in subtag:
            pattern += (None,)
            v_pattern += (tuple(subtag.split('/')),)
        else:
            pattern += (subtag,)
    def merge_pattern_and_product(pattern, product):
        ret = list(pattern)
        for e in product:
            ret[ret.index(None)] = e
        return ret
    CartesianProduct = tuple(itertools.product(*v_pattern)) # http://stackoverflow.com/a/170248
    return [ ':'.join(merge_pattern_and_product(pattern, product)) for product in CartesianProduct ]

#===============================================================================
# T E S T
#===============================================================================

for tag in slash_split_tag('drink:coffee/tea:hot/cold'):
    print tag
print
for tag in slash_split_tag('A1/A2:B1/B2/B3:C1/C2:D1/D2/D3/D4/EE'):
    print tag

问题：我怎样才能恢复这个过程？出于可读性的原因，我需要这个。

score 1 · Accepted Answer

这是对此类功能的简单首次尝试：

def compress_list(alist):
    """Compress a list of colon-separated strings into a more compact
    representation.
    """
    components = [ss.split(':') for ss in alist]

    # Check that every string in the supplied list has the same number of tags
    tag_counts = [len(cc) for cc in components]
    if len(set(tag_counts)) != 1:
        raise ValueError("Not all of the strings have the same number of tags")

    # For each component, gather a list of all the applicable tags. The set
    # at index k of tag_possibilities is all the possibilities for the
    # kth tag
    tag_possibilities = list()
    for tag_idx in range(tag_counts[0]):
        tag_possibilities.append(set(cc[tag_idx] for cc in components))

    # Now take the list of tags, and turn them into slash-separated strings
    tag_possibilities_strs = ['/'.join(tt) for tt in tag_possibilities]

    # Finally, stitch this together with colons
    return ':'.join(tag_possibilities_strs)

希望这些评论足以解释它是如何工作的。然而，有几点需要注意：

如果在标签列表中找到反斜杠，它不会做任何明智的事情，例如转义反斜杠。
这无法识别是否存在更微妙的划分，或者是否获得不完整的标签列表。考虑这个例子：
```
fish:cheese:red
chips:cheese:red
fish:chalk:red
```
它不会意识到只有cheeseand fish，chips而是将其折叠为fish/chips:cheese/chalk:red.
完成字符串中标签的顺序是随机的（或者至少，我认为它与给定列表中字符串的顺序没有任何关系）。tt如果这很重要，您可以在加入斜线之前进行排序。

使用问题中给出的三个列表进行测试似乎有效，尽管正如我所说，顺序可能与初始字符串不同：

food:fruit:apple:sweet/sour
drink:tea/coffee:hot/cold
wall/bike:painted:blue/red

python-2.7 - 反转（或简化）笛卡尔积？

1 回答 1

Related

Reference