python - 在 python 文本解析器中跳过行并将它们拆分为列

Question

我试图在 python 2.7.5 中解析一个空格分隔的文本文件，它看起来有点像：

variable         description      useless data
a1                asdfsdf           2342354 
            Sometimes it goes into further detail about the 
            variable/description here
a2                asdsfda           32123

编辑：对不起，开头添加的空格，我没有看到它们

我希望能够将文本文件拆分为一个数组，其中变量和描述位于 2 个单独的列中，并剪切所有无用的数据并跳过任何不以字符串开头的行。我设置代码开始的方式是：

import os
import pandas
import numpy
os.chdir('C:\folderwithfiles')
f = open('Myfile.txt', 'r')
lines = f.readlines()
for line in lines:
    if not line.strip():
        continue
    else:
        print(line)
print(lines)

截至目前，这段代码跳过了变量行之间的大部分描述行，但是在解析中仍然会弹出一些。如果我能得到任何帮助来解决我的跳线问题，或者帮助我开始使用列形成部分，那就太好了！我在python方面也没有很多经验。谢谢！

编辑：代码之前的文件的一部分

CASEID            (id) Case Identification                   1   15   AN



MIDX              (id) Index to Birth History                16   1  No
                           1:6

后：

CASEID            (id) Case Identification                   1   15   AN

MIDX              (id) Index to Birth History                16   1  No
                           1:6

score 1 · Accepted Answer

您想过滤掉以空格开头的行，并拆分所有其他行以获得前两列。

将这两个规则转换为代码：

with open('Myfile.txt') as f:
    for line in f:
        if not line.startswith(' '):
            variable, description, _ = line.split(None, 2)
            print(variable, description)

这里的所有都是它的。

或者，更直接地翻译：

with open('Myfile.txt') as f:
    non_descriptions = filter(lambda line: not line.startswith(' '), f)
    values = (line.split(None, 2) for line in non_descriptions)

现在values是元组上的迭代器(variable, description)。它很好而且具有声明性。第一行的意思是“过滤掉以空格开头的行”。第二个意思是“分割每一行以获得前两列”。（你可以把第一个写成genexpr而不是filter，或者第二个写成map而不是geneexpr，但我认为这是最接近英文描述的。）

score 0 · Accepted Answer

如果你使用熊猫试试这个：

from pandas import read_csv
data = read_csv('file.txt', error_bad_lines=False).drop(['useless data'])

如果您的文件是固定宽度的（而不是逗号分隔值），则使用pandas.read_fwf

score 0 · Accepted Answer

假设您的变量或描述中没有空格，这将起作用

with open('path/to/file') as infile:
    answer = []
    for line in file:
        if not line.strip():
            continue
        if line.startswith(' '): # skipping descriptions
            continue
        splits = line.split()
        var, desc = splits[:2]
        answer.append([var, desc])

python - 在 python 文本解析器中跳过行并将它们拆分为列

3 回答 3

Related

Reference