0

我想实现一个获取字母变化索引的算法。我有下面的列表,在这里我想找到每个字母更改的开头并放置除第一个之外的结果列表。因为,对于第一个,我们应该得到它出现的最后一个索引。让我给你举个例子:

letters=['A','A','A','A','A','A','A','A','A','A','A','A','B','C','C','X','D','X','B','B','A','A','A','A']

过渡:

 'A','A','A','A','A','A','A','A','A','A','A','A'-->'B'-->'C','C'-->'X'-->'D'-->'X'-->'B','B'-->'A','A','A','A'

这里,A 字母写完后,B 开始,我们应该把最后一个 A 的索引和第一个 B 的索引等等,但我们不应该将 X 字母包含在结果列表中。
期望的结果:

  [(11, 'A'), (12, 'B'), (13, 'C'), (16, 'D'), (18, 'B'), (20, 'A')]

到目前为止,我已经完成了这段代码,它会找到除 (11, 'A') 之外的其他项目。如何修改我的代码以获得所需的结果?

for i in range(len(letters)):
    if letters[i]!='X' and letters[i]!=letters[i-1]:
        result.append((i,(letters[i])))

我的结果:

[(12, 'B'), (13, 'C'), (16, 'D'), (18, 'B'), (20, 'A')] ---> missing (11, 'A').
4

8 回答 8

3

Now that you've explained you want the first index of every letter after the first, here's a one-liner:

letters=['A','A','A','A','A','A','A','A','A','A','A','A','B','C','C','X','D','X','B','B','A','A','A','A']
[(n+1, b) for (n, (a,b)) in enumerate(zip(letters,letters[1:])) if a!=b and b!='X']
#=> [(12, 'B'), (13, 'C'), (16, 'D'), (18, 'B'), (20, 'A')]

Now, your first entry is different. For this, you need to use a recipe which finds the last index of each item:

import itertools
grouped = [(len(list(g))-1,k) for k,g in (itertools.groupby(letters))]
weird_transitions = [grouped[0]] + [(n+1, b) for (n, (a,b)) in enumerate(zip(letters,letters[1:])) if a!=b and b!='X']
#=> [(11, 'A'), (12, 'B'), (13, 'C'), (16, 'D'), (18, 'B'), (20, 'A')]

Of course, you could avoid creating the whole list of grouped, because you only ever use the first item from groupby. I leave that as an exercise for the reader.

This will also give you an X as the first item, if X is the first (set of) items. Because you say nothing about what you're doing, or why the Xs are there, but omitted, I can't figure out if that's the right behaviour or not. If it's not, then probably use my entire other recipe (in my other answer), and then take the first item from that.

于 2013-08-01T21:10:48.143 回答
2

你的问题有点令人困惑,但这段代码应该做你想做的。

firstChangeFound = False
for i in range(len(letters)):
    if letters[i]!='X' and letters[i]!=letters[i-1]:
        if not firstChangeFound:
            result.append((i-1, letters[i-1])) #Grab the last occurrence of the first character
            result.append((i, letters[i]))
            firstChangeFound = True
        else:
             result.append((i, letters[i])) 
于 2013-08-01T20:27:58.253 回答
2

您想要(或者,您不想要,正如您最后解释的那样 - 请参阅我的其他答案):

import itertools
import functional # get it from pypi
letters=['A','A','A','A','A','A','A','A','A','A','A','A','B','C','C','X','D','X','B','B','A','A','A','A']
grouped = [(len(list(g)),k) for k,g in (itertools.groupby(letters))]
#=> [(12, 'A'), (1, 'B'), (2, 'C'), (1, 'D'), (2, 'B'), (4, 'A')]
#-1 to take this from counts to indices
filter(lambda (a,b): b!='X',functional.scanl(lambda (a,b),(c,d): (a+c,d), (-1,'X'), grouped))
#=> [(11, 'A'), (12, 'B'), (14, 'C'), (16, 'D'), (19, 'B'), (23, 'A')]

这将为您提供每个字母运行的最后一个索引,而不是 Xs。如果您想要相关字母之后的第一个索引,则将 -1 切换为 0。

scanl是一个reduce,它返回中间结果。

作为一般规则,首先或最后过滤是有意义的,除非由于某种原因很昂贵,或者过滤可以很容易地完成而不会增加复杂性。

此外,您的代码相对难以阅读和理解,因为您按索引进行迭代。这在 python 中是不寻常的,除非以数字方式操作索引。如果您要访问每个项目,通常直接迭代。

另外,你为什么想要这种特殊的格式?通常具有格式,(unique item,data)因为它可以很容易地放在dict.

于 2013-08-01T19:54:54.380 回答
1

对您的代码进行最少的更改,并遵循 Josh Caswell 的建议:

for i, letter in enumerate(letters[1:], 1):
    if letter != 'X' and letters[i] != letters[i-1]:
        result.append((i, letter))
first_change = result[0][0]
first_stretch = ''.join(letters[:first_change]).rstrip('X')
if first_stretch:
    result.insert(0, (len(first_stretch) - 1, first_stretch[-1]))
于 2013-08-01T21:23:39.470 回答
1

这是一个groupby用于生成单个序列的解决方案,可以从中提取第一个和最后一个索引。

import itertools
import functools
letters = ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'C', 'C', 'X', 'D', 'X', 'B', 'B', 'A', 'A', 'A', 'A']

groupbysecond = functools.partial(itertools.groupby,key=operator.itemgetter(1))

def transitions(letters):
    #segregate transition and non-transition indices
    grouped = groupbysecond(enumerate(zip(letters,letters[1:])))
    # extract first such entry from each group
    firsts = (next(l) for k,l in grouped)
    # group those entries together - where multiple, there are first and last
    # indices of the run of letters
    regrouped = groupbysecond((n,a) for n,(a,b) in firsts)
    # special case for first entry, which wants last index of first letter
    kfirst,lfirst = next(regrouped)
    firstitem = (tuple(lfirst)[-1],) if kfirst != 'X' else ()
    #return first item, and first index for all other letters
    return itertools.chain(firstitem,(next(l) for k,l in regrouped if k != 'X'))
于 2013-08-05T11:22:13.950 回答
0

借助字典保持运行时间与输入数量呈线性关系,这是一个解决方案:

letters=['A','A','A','A','A','A','A','A','A','A','A','A','B','C','C','X','D','X','B','B','A','A','A','A']

def f(letters):
    result = []
    added = {}
    for i in range(len(letters)):
        if (i+1 == len(letters)):            
            break            
        if letters[i+1]!='X' and letters[i+1]!=letters[i]:
            if(i not in added and letters[i]!='X'):
                result.append((i, letters[i]))
                added[i] = letters[i]
            if(i+1 not in added):
                result.append((i+1, letters[i+1]))
                added[i+1] = letters[i+1]
    return result

基本上,我的解决方案总是尝试添加发生更改的两个索引。但是字典(它具有恒定的时间查找告诉我们是否已经添加了元素或不排除重复项)。这负责添加第一个元素。否则,您可以使用 if 语句来指示仅运行一次的第一轮。但是,我认为该解决方案具有相同的运行时间。只要您不通过查找列表本身来检查是否添加了元素(因为这在最坏的情况下是线性时间查找),这将导致 O(n^2) 时间很糟糕!

于 2013-08-01T20:45:28.960 回答
0
letters=['A','A','A','A','A','A','A','A','A','A','A','A','B','C','C','X','D','X','B','B','A','A','A','A']
    #     0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22  23 
prev = letters[0]
result = []
for i in range(len(letters)):
    if prev!=letters[i]:
        result.append((i-1,prev))
    if letters[i]!='X':
        prev = letters[i]
    else:
        prev = letters[i+1]

result.append((len(letters)-1,letters[-1]))
print result

结果:(不是 OP 想要的结果,对不起,我一定误解了。见 JSutton 的回答)

[(11,'A'), (12,'B'), (14,'C'), (16,'D'), (19,'B'), (23,'A')]

这实际上是字母在更改或列表结束之前的最后一个实例的索引。

于 2013-08-01T19:51:25.787 回答
-1

这是我的建议。它分为三个步骤。

  1. 首先,找到每个字母运行的所有起始索引。
  2. 将第一个非 X 运行中的索引替换为其运行结束的索引,该索引将比下一次运行的开始小一。
  3. 过滤掉所有 X 次运行。

编码:

def letter_runs(letters):
    prev = None
    results = []

    for index, letter in enumerate(letters):
        if letter != prev:
            prev = letter
            results.append((index, letter))

    if results[0][1] != "X":
        results[0] = (results[1][0]-1, results[0][1])
    else: # if first run is "X" second must be something else!
        results[1] = (results[2][0]-1, results[1][1])

    return [(index, letter) for index, letter in results if letter != "X"]
于 2013-08-05T12:23:01.910 回答