我正在尝试从数字模型输出文本文件构建数据库。文本文件有四 (4) 行标题块数据,后跟多行 (41,149) 数据块,每行由单词“INTERNAL”隔开,后跟一些数字数据,如下所示:
Line1: Title block
Line2: Title block
Line3: Title block
Line4: Title block
Line5: INTERNAL 1.0 (10E16.9) -1
Line6: data data data data
Line7: data data data data
Line8 to Line25: data data data data
Line26: data data data data
Line27: INTERNAL 1.0 (10E16.9) -1
Line28: data data data data
..etc all the way down to line 41,149
数据块的大小不一致(即,某些数据块的数据行数比其他数据块多)。感谢这个站点的大量帮助,我已经能够获取 41,149 行数据并将每个数据块组织成单独的列表,我可以从中解析和构建数据库。我的问题是这个操作需要很长时间。我希望有人可以查看我下面的代码,并就如何更有效地运行它给我建议。如果需要,我可以附加模型输出文件。谢谢!
inFile = 'CONFINED_AQIFER.DIS'
strings = ['INTERNAL']
rowList = []
#Create a list of each row number where a data block begins
with open(inFile) as myFile:
for num, line in enumerate(myFile, 1):
if any(s in line for s in strings):
rowList.append(num)
#Function to get line data from row number
def getlineno(filename, lineno):
if lineno < 1:
raise TypeError("First line is line 1")
f = open(filename)
lines_read = 0
while 1:
lines = f.readlines(100000)
if not lines:
return None
if lines_read + len(lines) >= lineno:
return lines[lineno-lines_read-1]
lines_read += len(lines)
#Organize each data block into a unique list and append to a final list (fList)
fList = []
for row in range(len(rowList[1:])):
combinedList = []
i = rowList[row]
data = []
while i < rowList[row+1]:
line = getlineno(inFile, i)
data.append(line.split())
i+=1
for d in range(len(data))[1:]:
for x in data[d]:
combinedList.append(x)
fList.append(combinedList)