这是一种方法:
- 对您的条目进行排序
- 确定每个条目之间公共前缀的长度
- 通过在公共前缀比上一个条目短的点处分隔列表来对条目进行分组
示例实现:
def common_count(t0, t1):
"returns the length of the longest common prefix"
for i, pair in enumerate(zip(t0, t1)):
if pair[0] != pair[1]:
return i
return i
def group_by_longest_prefix(iterable):
"given a sorted list of strings, group by longest common prefix"
longest = 0
out = []
for t in iterable:
if out: # if there are previous entries
# determine length of prefix in common with previous line
common = common_count(t, out[-1])
# if the current entry has a shorted prefix, output previous
# entries as a group then start a new group
if common < longest:
yield out
longest = 0
out = []
# otherwise, just update the target prefix length
else:
longest = common
# add the current entry to the group
out.append(t)
# return remaining entries as the last group
if out:
yield out
示例用法:
text = """
TOKYO-BLING.1 H02-AVAILABLE
TOKYO-BLING.1 H02-MIDDLING
TOKYO-BLING.1 H02-TOP
TOKYO-BLING.2 H04-USED
TOKYO-BLING.2 H04-AVAILABLE
TOKYO-BLING.2 H04-CANCELLED
WAY-VERING.1 H03-TOP
WAY-VERING.2 H03-USED
WAY-VERING.2 H03-AVAILABLE
WAY-VERING.1 H03-CANCELLED
"""
T = sorted(t.strip() for t in text.split("\n") if t)
for L in group_by_longest_prefix(T):
print L
这会产生:
['TOKYO-BLING.1 H02-AVAILABLE', 'TOKYO-BLING.1 H02-MIDDLING', 'TOKYO-BLING.1 H02-TOP']
['TOKYO-BLING.2 H04-AVAILABLE', 'TOKYO-BLING.2 H04-CANCELLED', 'TOKYO-BLING.2 H04-USED']
['WAY-VERING.1 H03-CANCELLED', 'WAY-VERING.1 H03-TOP']
['WAY-VERING.2 H03-AVAILABLE', 'WAY-VERING.2 H03-USED']
在此处查看实际操作:http: //ideone.com/1Da0S