parsing - 使用 Python 从文本文件中解析数值数据

Question

我正在尝试从数字模型输出文本文件构建数据库。文本文件有四 (4) 行标题块数据，后跟多行 (41,149) 数据块，每行由单词“INTERNAL”隔开，后跟一些数字数据，如下所示：

Line1: Title block
Line2: Title block
Line3: Title block
Line4: Title block
Line5: INTERNAL       1.0 (10E16.9)  -1
Line6: data data    data    data 
Line7: data data    data    data 
Line8 to Line25: data   data    data    data 
Line26: data    data    data    data 
Line27: INTERNAL       1.0 (10E16.9)  -1
Line28: data    data    data    data 
..etc all the way down to line 41,149

数据块的大小不一致（即，某些数据块的数据行数比其他数据块多）。感谢这个站点的大量帮助，我已经能够获取 41,149 行数据并将每个数据块组织成单独的列表，我可以从中解析和构建数据库。我的问题是这个操作需要很长时间。我希望有人可以查看我下面的代码，并就如何更有效地运行它给我建议。如果需要，我可以附加模型输出文件。谢谢！

inFile = 'CONFINED_AQIFER.DIS'

strings = ['INTERNAL']
rowList = []
#Create a list of each row number where a data block begins
with open(inFile) as myFile:
    for num, line in enumerate(myFile, 1):
        if any(s in line for s in strings):
            rowList.append(num)
#Function to get line data from row number
def getlineno(filename, lineno):
    if lineno < 1:
        raise TypeError("First line is line 1")
    f = open(filename)
    lines_read = 0
    while 1:
        lines = f.readlines(100000)
        if not lines:
            return None
        if lines_read + len(lines) >= lineno:
            return lines[lineno-lines_read-1]
        lines_read += len(lines)
#Organize each data block into a unique list and append to a final list (fList)
fList = []
for row in range(len(rowList[1:])):
    combinedList = []
    i = rowList[row]
    data = []
    while i < rowList[row+1]:
        line = getlineno(inFile, i)
        data.append(line.split())
        i+=1
    for d in range(len(data))[1:]:
        for x in data[d]:
            combinedList.append(x)
    fList.append(combinedList)

score 0 · Accepted Answer

一些评论：

在 Python2 中，xrange 总是比 range 好。Range 构建整个列表，而 xrange 只返回一个迭代器。

使用更多列表推导：更改

for x in data[d]:
            combinedList.append(x)

至

combinedList.extend([x for x in data[d]])

看看您是否可以将这些技术外推到您的更多代码中。

通常，您不想在 for 循环内分配内存（创建新列表）。

parsing - 使用 Python 从文本文件中解析数值数据

1 回答 1

Related

Reference