python - Extracting selected columns from a datafile using python

Question

I have a data file like this

0.000       1.185e-01  1.185e-01  3.660e-02  2.962e-02  0.000e+00  0.000e+00  0.000e+00  0.000e+00  0.000e+00
0.001       1.185e-01  1.185e-01  3.660e-02  2.962e-02  -1.534e-02  -1.534e-02  8.000e-31  8.000e-31  0.000e+00
0.002       1.185e-01  1.185e-01  3.659e-02  2.961e-02  -1.541e-02  -1.541e-02  -6.163e-01  -6.163e-01  -4.284e-05
0.003       1.186e-01  1.186e-01  3.657e-02  2.959e-02  -1.547e-02  -1.547e-02  -8.000e-31  -8.000e-31  0.000e+00
0.004       1.186e-01  1.186e-01  3.657e-02  2.959e-02  -2.005e-32  -2.005e-32  -8.000e-31  -8.000e-31  0.000e+00
0.005       1.186e-01  1.186e-01  3.657e-02  2.959e-02  -2.005e-32  -2.005e-32  -8.000e-31  -8.000e-31  0.000e+00
0.006       1.187e-01  1.186e-01  3.657e-02  2.959e-02  -2.005e-32  -2.005e-32  -8.000e-31  -8.000e-31  0.000e+00
0.007       1.187e-01  1.187e-01  3.657e-02  2.959e-02  -2.005e-32  -2.005e-32  -8.000e-31  -8.000e-31  0.000e+00
0.008       1.188e-01  1.187e-01  3.657e-02  2.959e-02  -2.005e-32  -2.005e-32  -8.000e-31  -8.000e-31  0.000e+00
0.009       1.188e-01  1.187e-01  3.657e-02  2.959e-02  -2.005e-32  -2.005e-32  -8.000e-31  -8.000e-31  0.000e+00

I want to copy only selected columns from this file to another file. Suppose I copy the 1st, 2nd and 6th columns to a file, then that file should look like

0.000       1.185e-01  0.000e+00
0.001       1.185e-01  -1.534e-02
0.002       1.185e-01  -1.541e-02
0.003       1.186e-01  -1.547e-02
0.004       1.186e-01  -2.005e-32
0.005       1.186e-01  -2.005e-32
0.006       1.187e-01  -2.005e-32
0.007       1.187e-01  -2.005e-32
0.008       1.188e-01  -2.005e-32
0.009       1.188e-01  -2.005e-32

This is a very large formatted text file which was initially written like this

f=open('myMD.dat','w')
s='%8.3e  %8.3e  %8.3e  %8.3e  %8.3e  %8.3e  %8.3e  %8.3e  %8.3e\t\t'%(xpos1[i],ypos1[i],xvel1[i],yvel1[i],xacc1[i],yacc1[i],xforc[i],yforc[i],potn[i])
f.write(s)
f.close()

I am programming in python. How can I do this?

score 1 · Accepted Answer

这将读取给定的输入文件并使用给定的逗号分隔的行列表选择行：

import sys
input_name = sys.argv[1]
column_list = [(int(x) - 1) for x in sys.argv[2].split(',')]
with open(input_name) as input_file:
    for line in input_file:
        row = line.split()
        for col in column_list:
            print row[col],
        print ""

它一次读取并打印一行，这意味着它应该能够处理任意大的输入文件。使用您的示例数据作为input.txt，我运行它是这样的：

python selected_columns.py input.txt 1,2,6

它产生以下输出（省略号用于显示为简洁而删除的行）：

0.000 1.185e-01 0.000e+00 
0.001 1.185e-01 -1.534e-02 
...
0.009 1.188e-01 -2.005e-32

您可以使用重定向将输出保存到文件中：

python selected_columns.py input.txt 1,2,6 > output.txt

score 1 · Accepted Answer

简单得多，但用途广泛。假设所有字段都不包含任何空格，您可以简单地在每一行上使用 split 方法来获取字段列表，然后打印您想要的字段。这是一个脚本，可让您指定输出的列和分隔符字符串。

注意：我们绝不会在字符串和浮点数之间进行转换。这保留了以前的数字，对于一个巨大的文件，节省了大量的 CPU！

COLS=0,1,5  # the columns you want. The first is numbered zero.
            # NB its a tuple: COLS=0, for one column, mandatory trailing comma

SEP = ', '  # the string you want to separate the columns in the output

INFILE='t.txt'      # file to read from
OUTFILE='out.txt'   # file to write to

f = open( INFILE, 'r')
g = open( OUTFILE, 'w')

for line in f.readlines():
   x = line.split()
   if x != []:  # ignore blank lines

       y = [ x[i] for i in COLS ]
       outline = SEP.join( '{}'.format(q) for q in y )
       g.write( outline+'\n')

刚刚意识到，'{}'.format(q) for q in y这里是矫枉过正。y 是要输出不变的字符串数组，这SEP.join(y)就是您所需要的。但是显示将格式应用于相似元素列表的模式可能很有用。

score 0 · Accepted Answer

这是什么文件？逗号分隔？纯文本？如果它是一个 *.csv 文件，你可以试试这个：

openFile = open('filepath', 'r')
dataIn = csv.reader(openFile, delimiter=' ')
col1, col2, col6 = [], [], []
for rows in dataIn:
    col1.append(rows[0])
    col2.append(rows[1])
    col6.append(rows[5])

score 0 · Accepted Answer

列数据

此方法适用于满足以下要求的任何数据文件：

数据用空格分隔[即空格、制表符、回车]
数据条目不包含空格

给出的样本数据符合这些要求。此方法使用Python 3和正则表达式从数据中提取特定列。

要简单地使用它：

调用init(file)一次函数
- 传入数据文件的路径
然后getColm(i)根据需要多次调用
- 传入你需要的列
- 它将返回该列条目的数组。

这是代码。确保导入正则表达式库re。

import re

matrixOfFile = []

# Prep the matrixOfFile variable
def init(filepath):
    global matrixOfFile
    # Read the file content
    with open(filepath,'r') as file:
        fileContent = file.read()       
    # Split the file into rows
    rows = fileContent.split("\n")

    # Split rows into entries and add them to matrixOfFile
    for row in rows: # For each row, find all of the entries in the row that
                     # are non-space characters and add those entries to the
                     # matrix
        matrixOfFile.append(re.findall("\S+",row))

# Returns the ith column of the matrixOfFile
# i should be an int between 0 and len(matrixOfFile[0])
def getColm(i):
    global matrixOfFile
    if i<0 or i>=len(matrixOfFile[0]):
        raise ValueError('Column '+str(i)+' does not exist')
    colum = []
    for row in matrixOfFile: # For each row, add whatever is in the ith 
                  # column to colum
        colum.append(row[i])

    return colum

# Absolute filepath might be necessary ( eg "C:/Windows/Something/Users/Documents/data.dat" )
init("data.dat") 
# Gets the first, second and sixth columns of data
print(getColm(0))
print(getColm(1))
print(getColm(5))

python - Extracting selected columns from a datafile using python

4 回答 4

列数据

要简单地使用它：

Related

Reference