python - 从多个文本文件中消除标题信息

Question

我有多个文本文件（超过 500 个文件）。每个文件都以我不需要并希望从文件中删除的标题信息开头。所有文件的标题信息在第 33 行结束。执行此类任务的最佳方式/工具是什么？

我可以访问 R，如有必要，我可以访问 python。我在下面提供了一张图片作为这些文件的一个示例。（想去掉~A之前的信息）

感谢您提前提供帮助。

score 1 · Accepted Answer

import os

filename = 'foo.txt'
temp_filename = 'foo.temp.txt'

with open(filename) as f:
    # skip 32 lines:
    for n in range(32):
        f.readline()
    # write data from line 33 and next lines to a new file
    with open(temp_filename, 'w') as w:
        w.writelines(f)

# delete original file and rename the temp file so it replaces the original
os.remove(filename)
os.rename(temp_filename, filename)

score 1 · Accepted Answer

熊猫read_csv有一个skiprows参数：

pd.read_csv('foo.txt', skiprows=33)

或者，使用上下文处理程序：

with pd.read_csv('foo.txt', skiprows=33) as f:

score 0 · Accepted Answer

Rread.table有一个跳过参数。但是，标题行开头的“~A”需要特殊处理。我想我可能也会把它排除在外，然后按照你的意愿分配列名。

 filename <- "sthng.txt"
 my_df <- read.table( filename, header = FALSE, 
                                colnames=c("DET", "hello", "Variable"),
                                skip = 34)

python - 从多个文本文件中消除标题信息

3 回答 3

Related

Reference