0

我有以下格式的文本文件:

AAAAATTTTTT
AAATTTTTTGGG
TTTDDDCCVVVVV

我正在尝试计算在行首和行尾按顺序出现的字符数。

我写了以下函数:

def getStartEnd(sequence):
    start = sequence[0]
    end = sequence[-1]
    startCount = 0
    endCount = 0

    for char in sequence:
        if char == start:
            startCount += 1
            if ( char != start):
                break

    for char in reversed(sequence):
        if char == end:
            endCount += 1
            if ( char != end):
                break

    return startCount, endCount

此函数在字符串上独立工作。例如:

seq = "TTTDDDCCVVVVV"
a,b = getStartEnd(seq)
print a,b

但是当我插入一个 for 循环时,它只在文件的最后一行给出正确的值。

file = open("Test.txt", 'r')

for line in file:
    a,b = getStartEnd(str(line))
    print a, b
4

2 回答 2

3

因为除最后一行之外的行包含换行符。

尝试以下(去除尾随空格):

with open("Test.txt", 'r') as f:
    for line in f:
        a, b = getStartEnd(line.rstrip())
        print a, b

顺便说一句,( char != end )在以下代码中始终为 False。(对于 相同( char != start)

for char in reversed(sequence):
    if char == end:
        endCount += 1
        if ( char != end): # always False because char == end
            break

你是这个意思吗?

for char in reversed(sequence):
    if char == end:
        endCount += 1
    else:
        break

如何使用itertools.takewhile

import itertools

def getStartEnd(sequence):
    start = sequence[0]
    end = sequence[-1]
    start_count = sum(1 for _ in itertools.takewhile(lambda ch: ch == start, sequence))
    end_count = sum(1 for _ in itertools.takewhile(lambda ch: ch == end, reversed(sequence)))
    return start_count, end_count
于 2013-10-03T05:37:04.983 回答
1

三件事。首先,在您的函数中,您可能打算break使用以下结构。

for char in sequence:
    if char == start:
        startCount += 1
    else:
        break

for char in reversed(sequence):
    if char == end:
        endCount += 1
    else:
        break

其次,当您遍历文件中的行时,您不需要使用str函数将行转换为字符串。他们已经是字符串了!

第三,行包括换行符,如下所示:'\n'它们用于告诉计算机何时结束一行并开始新的一行。要摆脱它们,您可以使用rstripstring 的方法,如下所示:

file = open("Test.txt", 'r')

for line in file:
    a,b = getStartEnd(line.rstrip())
    print a, b
file.close()
于 2013-10-03T05:43:09.390 回答