python - python：以txt格式格式化的数据库e跳过第一行和前两列

Question

我有一个名为 DB.TXT 的普通 txt 数据库（分隔符 Tab 仅适用于数字），如下所示：

Date        Id  I   II  III IV  V
17-jan-13   aa  47  56  7   74  58
18-jan-13   ab  86  2   30  40  75
19-jan-13   ac  72  64  41  81  80
20-jan-13   ad  51  26  43  61  32
21-jan-13   ae  31  62  32  25  75
22-jan-13   af  60  83  18  35  5
23-jan-13   ag  29  8   47  12  69

我想知道 Python 中的代码，用于在读取文本文件时跳过第一行（Date、I、II、III、IV、V）和前两列（Date 和 Id）。（带数字的残差应该做加法和乘法等）

读取txt文件后会出现这样的情况：

47 56 7 74 58

86 2 30 40 75

72 64 41 81 80

51 26 43 61 32

31 62 32 25 75

60 83 18 35 5

29 8 47 12 69

该文件是txt格式，而不是CSV。

score 0 · Accepted Answer

使用csv 模块，要跳过第一行，只需通过调用来推进文件迭代器next(f)。要跳过前两行，您可以使用row = row[2:]：

import csv
with open(filename, 'rb') as f:
    next(f)   # skip the first line
    for row in csv.reader(f, delimiter='\t'):
        row = row[2:]        # skip the first two columns
        row = map(int, row)  # map the strings to ints

score 0 · Accepted Answer

如果您只想对行进行计算，您可以简单地执行以下操作：

with open("data.txt") as fh:
    fh.next()
    for line in fh:
        line = line.split()  # This split works equally well for tabs and other spaces
        do_something(line[2:])

如果您的需求更复杂，最好使用 Pandas 之类的库，它可以处理标题和标签列，以及正则表达式分隔符，并让您轻松访问列：

import pandas
data = pandas.read_csv("blah.txt", sep="\s+", index_col=[0,1])
data.values   # array of values as requested
data.sum()    # sum of each column
data.product(axis=1)    # product of each row
etc...

sep是一个正则表达式，因为你说它并不总是\t，并且index_col使前两列成为列标签。

score 0 · Accepted Answer

“python中的代码”非常广泛。使用 numpy，它是：

In [21]: np.genfromtxt('db.txt',dtype=None,skip_header=1,usecols=range(2,6))
Out[21]: 
array([[47, 56,  7, 74],
       [86,  2, 30, 40],
       [72, 64, 41, 81],
       [51, 26, 43, 61],
       [31, 62, 32, 25],
       [60, 83, 18, 35],
       [29,  8, 47, 12]])

python - python：以txt格式格式化的数据库e跳过第一行和前两列

3 回答 3

Related

Reference