-1

我正在读取一个数据文件,我有一个包含文件行的列表,它是这样的:

>>> oz[:15]
[' 283283283283283283283283284284284284284284284284284284284284284284284284284\n',
' 284284284284284284284284284284284284284284284284284284284284284284284284284\n',
...
' 291291292292292292292292293293293293293293293293293293293293294294294294294\n',
' 294294294294294294294294295295   lat =  -89.5\n']

现在我想以一种聪明的方式将数字存储在这个列表中,我需要一个每 3 位数字的元素列表,但是如果我以这种方式打印输出,一切都很好:

for ll in range(0,60):
    for k in range(1,73+3,3):
        if k==31 and ((ll+1)%15==0): 
            break                       
        else: 
            print oz[ll][k:k+3]

我得到了正确的输出,数字 283, 283,... 但是如果我尝试将它们存储在列表中,则列表中的结果是错误的:

DU = []

# Populate DU array
for ll in range(0,2700):
    for k in range(1,73+3,3):
        if k==31 and ((ll+1)%15==0): 
            break                       
        else: 
            DU.append(oz[ll][k:k+3])

我在填写 DU 列表时做错了什么?

编辑:我更好地解释了我想要实现的目标:我有一个具有以下格式的 oz 列表:

[' 283283283283283283283283284284284284284284284284284284284284284284284284284\n', ' 284284284284284284284284284284284284284284284284284284284284284284284284284\n', ' 284284284284284284284284284284284284284284284284284284284284284284284284284\n', ' 284284284284284284284284284284284284284284284284284284284284284284284284284\n', ' 284284284284284284284284284284284284284284284284284284284284284284283283283\n', ' 283283283283283283283283283283283283283283283283283283283283283283283283283\n', ' 283283283283283283283283283283283283283283283283283283283283283283283283283\n', ' 283283283283283283283283283283283283283283283283283283283283283283283283283\n', ' 283283283283283282282283283282282282282283283283283283283283283283283283283\n', ' 283283283283283283283283283283283283283283283283283283283283283284284284284\n', ' 284284284284284284284284284284284285285285285285285285285285285285285285285\n', ' 285285286286286286286286286287287287287287287288288288288288288288288288288\n', ' 288289289289289289289289289290290290290290290290290290291291291291291291291\n', ' 291291292292292292292292293293293293293293293293293293293293294294294294294\n', ' 294294294294294294294294295295   lat =  -89.5\n', ' 284284284284284284284284284284284284284284284284284284284284284284284284284\n', ' 284284284284284284284284284284283283284284284284284284284284284284284284283\n', ' 283283283283283283283283283283283283283283283283283283283283283283283283283\n', ' 283283283283283283283283283283284284284284284284284284284284284284284284284\n', ' 284284284284284284284284284284284284283283283283283283283283283283283283283\n', ' 283283283283283283283283283283283282282282282282282282282282282282282282281\n', ' 281281281281281281281281281281281281281281281281280280280280280280280280279\n', ' 279279279279279279279279279279279279279279278278278278278278278278278278278\n', ' 277277278278278278278278278278278278278278278278278278278278278278278278278\n', ' 278278279279279279279279279279279279279279279279279279279279279279279279279\n', ' 279279280280280280280280280280280280280280280280280280280281281281281281281\n', ' 281282282282282282282282283283283283283283284284284284284284285285285285285\n', ' 286286286287287287287288288288288288288289289289289289290290290290291291291\n', ' 292292292292292292293293293293293293293293293294294294294295295295295295295\n', ' 296296296296296296296297297297   lat =  -88.5\n']

我需要的是用 ['283', '283', '283', '283'] 之类的数字三元组填写一个列表,记住每 15 行有一行带有“lat ...”文本,我想脱。我希望现在更清楚了。

4

4 回答 4

2

您目前的代码似乎很难编码我不确定您要实现的目标,但请尝试:

DU = []

for index, line in enumerate(oz):

   line = line.strip() if (index +1) % 15 != 0 else line.strip().split(' ')[0]

   for i in range(0,len(line)-3,3):

      DU.append(line[i:i+3])

或者您可以尝试组合答案

 from itertools import izip

 def grouped(iterable, n):
      "s -> (s0,s1,s2,...sn-1), (sn,sn+1,sn+2,...s2n-1), (s2n,s2n+1,s2n+2,...s3n-1), ..."
      return izip(*[iter(iterable)]*n)

 DU = []

 for index, line in enumerate(oz):

        line = line.strip() if (index +1) % 15 != 0 else line.strip().split(' ')[0]


        DU.append(map(''.join, grouped(line.strip(), 3)))
于 2012-08-28T20:02:17.380 回答
1

您可能可以在我对另一个问题的回答中使用更新中的内容来每个数字中的数字字符串进行分组。n具体使用此代码,将数字字符串作为可迭代对象,并将(group-size) 参数的值为 3 :

from itertools import izip

def grouped(iterable, n):
    "s -> (s0,s1,s2,...sn-1), (sn,sn+1,sn+2,...s2n-1), (s2n,s2n+1,s2n+2,...s3n-1), ..."
    return izip(*[iter(iterable)]*n)

digits = '283283283283283283283283284284284284284284284284284284284284284284284284284\n'

print map(''.join, grouped(digits.strip(), 3))

输出:

['283', '283', '283', '283', '283', '283', '283', '283', '284', '284', 
'284', '284', '284', '284', '284', '284', '284', '284', '284', '284', 
'284', '284', '284', '284', '284']

但是,我注意到您示例中的最后一行数据是:

'294294294294294294294294295295 lat = -89.5\n'

不仅仅是一串数字,因此必须将其作为特殊情况处理。

更新:

好的,既然我看到了您在问题中添加的附加信息,我可以根据grouped()我最初建议的其他答案中的功能为您提供完整的解决方案。这通过拆分每一行数据然后忽略除第一个(通常是唯一)之外的所有数据,它始终是一串数字,然后通过进一步处理我的功能。

from itertools import izip

def grouped(iterable, n):
    "s -> (s0,s1,s2,...sn-1), (sn,sn+1,sn+2,...s2n-1), (s2n,s2n+1,s2n+2,...s3n-1), ..."
    return izip(*[iter(iterable)]*n)

data = [' 283283283283283283283283284284284284284284284284284284284284284284284284284\n', ' 284284284284284284284284284284284284284284284284284284284284284284284284284\n', ' 284284284284284284284284284284284284284284284284284284284284284284284284284\n', ' 284284284284284284284284284284284284284284284284284284284284284284284284284\n', ' 284284284284284284284284284284284284284284284284284284284284284284283283283\n', ' 283283283283283283283283283283283283283283283283283283283283283283283283283\n', ' 283283283283283283283283283283283283283283283283283283283283283283283283283\n', ' 283283283283283283283283283283283283283283283283283283283283283283283283283\n', ' 283283283283283282282283283282282282282283283283283283283283283283283283283\n', ' 283283283283283283283283283283283283283283283283283283283283283284284284284\n', ' 284284284284284284284284284284284285285285285285285285285285285285285285285\n', ' 285285286286286286286286286287287287287287287288288288288288288288288288288\n', ' 288289289289289289289289289290290290290290290290290290291291291291291291291\n', ' 291291292292292292292292293293293293293293293293293293293293294294294294294\n', ' 294294294294294294294294295295   lat =  -89.5\n', ' 284284284284284284284284284284284284284284284284284284284284284284284284284\n', ' 284284284284284284284284284284283283284284284284284284284284284284284284283\n', ' 283283283283283283283283283283283283283283283283283283283283283283283283283\n', ' 283283283283283283283283283283284284284284284284284284284284284284284284284\n', ' 284284284284284284284284284284284284283283283283283283283283283283283283283\n', ' 283283283283283283283283283283283282282282282282282282282282282282282282281\n', ' 281281281281281281281281281281281281281281281281280280280280280280280280279\n', ' 279279279279279279279279279279279279279279278278278278278278278278278278278\n', ' 277277278278278278278278278278278278278278278278278278278278278278278278278\n', ' 278278279279279279279279279279279279279279279279279279279279279279279279279\n', ' 279279280280280280280280280280280280280280280280280280280281281281281281281\n', ' 281282282282282282282282283283283283283283284284284284284284285285285285285\n', ' 286286286287287287287288288288288288288289289289289289290290290290291291291\n', ' 292292292292292292293293293293293293293293293294294294294295295295295295295\n', ' 296296296296296296296297297297   lat =  -88.5\n']

DU = []
for line in data:
    DU.extend(map(''.join, grouped(line.strip().split()[0], 3)))

print DU

输出:



你可以像这样做出一个有效的、相当不可读的列表理解:

from itertools import chain

DU = list(chain.from_iterable(map(''.join, grouped(line.strip().split()[0], 3))
                                             for line in data))
于 2012-08-28T19:35:39.143 回答
0

如何使用正则表达式匹配 3 个连续数字的集合:

import re

def oz_reader(oz):
    for line in oz:
        matches = re.findall(r"\d{3}", line)
        for num in matches:
            yield num

请注意,该函数返回一个生成器,而不是一个列表。如果您真的需要一个带有输出的列表,只需list在其上使用构造函数:

result_list = list(oz_reader(oz))
于 2012-08-28T22:03:24.573 回答
0

谢谢大家的回复,代码实际上运行良好,我在程序的另一部分遇到了问题,我只是累了,但感谢你为这段代码提供的不同可能性!

于 2012-08-29T08:12:04.713 回答