2

我在下面有一个数组,它由重复的字符串组成。我想查找并替换这些字符串,但是每次进行匹配时,我都想更改替换字符串的值。

让我演示一下。

此示例数组:

SampleArray = ['champ', 'king', 'king', 'mak', 'mak', 'mak']

应该改为:

SampleArray = ['champ', 'king1', 'king2', 'mak1', 'mak2', 'mak3']

如何使这成为可能?我已经做了3天了,没有运气。提前致谢。

My Failed Code:

import os, collections, re

SampleArray = ['champ', 'king', 'king', 'mak', 'mak', 'mak']
dupes = [x for x, y in collections.Counter(SampleArray).items() if y > 1]
length = len(dupes)
count = 0

while count < length:
    j = 0
    instances = SampleArray.count(dupes[count])
    while j < instances:
        re.sub(dupes[count],  dupes[count] + j, SampleArray, j)
        j += 1
    count += 1
print SampleArray    
print ''; os.system('pause')
4

6 回答 6

6

我会使用 collections.Counter:

from collections import Counter

numbers = { 
    word: iter([""] if count == 1 else xrange(1, count + 1)) 
    for word, count in Counter(sample).items()
}

result = [
    word + str(next(numbers[word])) 
    for word in sample
]

这不需要以任何方式对列表进行排序或分组。

此解决方案使用迭代器生成序列号:

  • 首先,我们计算每个单词在列表中出现的次数(Counter(sample))。

  • 然后我们创建一个字典numbers,对于每个单词,它都包含它的“编号”迭代器iter(...)。如果单词只出现一次count==1,这个迭代器将返回(“yield”)一个空字符串,否则它将产生从 1 到 count 范围内的连续数字[""] if count == 1 else xrange(1, count + 1)

  • 最后,我们再次遍历列表,并且对于每个单词,从它自己的编号迭代器中选择下一个值next(numbers[word])。由于我们的迭代器返回数字,我们必须将它们转换为字符串str(...)

于 2013-06-05T16:00:18.747 回答
2

groupby是对重复项进行分组的便捷方法:

>>> from itertools import groupby
>>> FinalArray = []
>>> for k, g in groupby(SampleArray):
    # g is an iterator, so get a list of it for further handling
    items = list(g)
    # If only one item, add it unchanged
    if len(items) == 1:
        FinalArray.append(k)
    # Else add index at the end
    else:
        FinalArray.extend([j + str(i) for i, j in enumerate(items, 1)])


>>> FinalArray
['champ', 'king1', 'king2', 'mak1', 'mak2', 'mak3']
于 2013-06-05T15:54:41.023 回答
0

编辑

计数器和比排序更简单:

L = ['champ', 'king', 'king', 'mak', 'mak', 'mak']
counts = Counter(L)
res = []
for word in sorted(counts.keys()):
    if counts[word] == 1:
        res.append(word)
    else:
        res.extend(['{}{}'.format(word, index) for index in 
                   range(1, counts[word] + 1)])

所以这

['champ', 'mak', 'king', 'king', 'mak', 'mak']

还给出:

['champ', 'king1', 'king2', 'mak1', 'mak2', 'mak3']
于 2013-06-05T15:37:35.000 回答
0

一种方法是将您的数组转换为这样的字典:

SampleDict = {}
for key in SampleArray:
    if key in SampleDict:
        SampleDict[key][0] = True # means: duplicates
        SampleDict[key][1] += 1 
    else:
        SampleDict[key] = [False, 1] # means: no duplicates

现在您可以轻松地将该 dict 转换回数组。但是,如果输入的顺序SampleArray很重要,那么您可以这样做:

for i in range(len(SampleArray)):
    key = SampleArray[i]
    counter = SampleDict[key]
    if index[0]:
        SampleArray[i] = key + str(counter[1])
    counter[1] -= 1

然而,这会给你相反的顺序,即

SampleArray = ['champ', 'king2', 'king1', 'mak3', 'mak2', 'mak1']

但我相信您将能够根据您的需要对其进行调整。

于 2013-06-05T15:39:19.417 回答
0

假设您想要对数组进行排序:

import collections    
counter = collections.Counter(SampleArray)
res = []
for key in sorted(counter.keys()):
    if counter[key] == 1:
        res.append(key)
    else:
        res.extend([key+str(i) for i in range(1, counter[key]+1)])

>>> res
['champ', 'king1', 'king2', 'mak1', 'mak2', 'mak3']
于 2013-06-05T15:58:58.483 回答
0
f = ['champ', 'king', 'king', 'mak', 'mak', 'mak']

fields_out = [x + str(f.count(x) - f[i + 1:].count(x)) for i, x in enumerate(f)]
print(fields_out)

>>['champ1', 'king1', 'king2', 'mak1', 'mak2', 'mak3']

或者

fields_out = [(x if i == f.index(x) else x + str(f.count(x) - f[i + 1:].count(x))) for i, x in enumerate(f)]
print(fields_out)

>>['champ', 'king', 'king2', 'mak', 'mak2', 'mak3']
于 2020-10-23T16:30:57.623 回答