python - 我需要使用 Python 从多个 .txt 文件中提取数据并将它们移动到 Excel 文件中

Question

.txt 文件包含 68 行。第 68 行有 5 条数据需要我提取，但我不知道如何提取。我有大约 20 个 .txt 文件，所有这些文件都需要读取它们的第 68 行。但是，我需要将所有提取的数据拖放到一个 Excel 文件中。

这是第 68 行的样子：

Final graph has 1496 nodes and n50 of 53706, max 306216, total 5252643, using 384548/389191 reads

我基本上需要所有这些数字。

score 0 · Accepted Answer

使用以下命令打开文本文件：

f = open('filepath.txt', 'r')
for line in f:
    #do operations for each line in the textfile

对要阅读的每个文本文件重复

这是一个用于读取/写入excel的python库的链接。你想用xlwt，听起来像

score 0 · Accepted Answer

我喜欢使用openpyxl来完成这样的任务。下面是一个文件的示例。您应该能够将其扩展到多个文件。您没有确切说明您希望如何格式化电子表格中的数据，所以我只创建了一行标题，然后为文件创建了一行数据（5 个字段）。如果我有更多关于你的项目的信息，这可以被改进。

from openpyxl import Workbook
import re

wb = Workbook()
ws = wb.get_active_sheet()

# write column headers
ws.cell(row=0, column=0).value = 'nodes'
ws.cell(row=0, column=1).value = 'n50'
ws.cell(row=0, column=2).value = 'max'
ws.cell(row=0, column=3).value = 'total'
ws.cell(row=0, column=4).value = 'reads'

# open file and extract lines into list            
f = open("somedata.txt", "r")
lines = f.readlines()

# compile regex using named groups and apply regex to line 68
p = re.compile("^Final\sgraph\shas\s(?P<nodes>\d+)\snodes\sand\sn50\sof\s(?P<n50>\d+),\smax\s(?P<max>\d+),\stotal\s(?P<total>\d+),\susing\s(?P<reads>\d+\/\d+)\sreads$")
m = p.match(lines[67])

# if we have a match, then write the data to the spreadsheet
if (m):
    ws.cell(row=1, column=0).value = m.group('nodes')
    ws.cell(row=1, column=1).value = m.group('n50')
    ws.cell(row=1, column=2).value = m.group('max')
    ws.cell(row=1, column=3).value = m.group('total')
    ws.cell(row=1, column=4).value = m.group('reads')

wb.save('mydata.xlsx')

score 0 · Accepted Answer

与依赖正则表达式的 David 相比，以下内容稍微不那么优雅但更透明。它强烈依赖于您描述的特定格式。此外，在我看来，您实际上关心的是 6 个（不是 5 个）变量——除非您可以将读取中的比率转换为小数。

您需要在 nameList 中提供正确的文件名列表（手动，如果它们没有以方便的方式命名）。

另外，我不输出到 excel 文件，而是输出到 csv。当然，在 Excel 中打开一个 csv 文件非常简单，您可以从中保存为 xls。

编辑以回应评论（05/19/13）：包括完整路径很简单。

import csv
import string

# Make list of all 20 files like so:
nameList = ['/full/path/to/Log.txt', '/different/path/to/Log.txt', '/yet/another/path/to/Log.txt']

lineNum = 68

myCols = ['nodes','n50','max','total','reads1','reads2']
myData = []

for name in nameList:
    fi = open(name,"r")

    table = string.maketrans("","")

    # split line lineNum into list of strings
    strings = fi.readlines()[lineNum-1].split()

    # remove punctuation appropriately
    nodes = int(strings[3])
    n50 = int(strings[8].translate(table,string.punctuation))
    myMax = int(strings[10].translate(table,string.punctuation))
    total = int(strings[12].translate(table,string.punctuation))
    reads1 = int(strings[14].split('/')[0])
    reads2 = int(strings[14].split('/')[1])

    myData.append([nodes, n50, myMax, total, reads1, reads2])

# Write the data out to a new csv file
fileOut = "out.csv"
csvFileOut = open(fileOut,"w")
myWriter = csv.writer(csvFileOut)
myWriter.writerow(myCols)
for line in myData:
    myWriter.writerow(line)
csvFileOut.close()

python - 我需要使用 Python 从多个 .txt 文件中提取数据并将它们移动到 Excel 文件中

3 回答 3

Related

Reference