python - 使用 np.loadtxt Python 从 .dat 中提取列

Question

我有这个文本文件：www2.geog.ucl.ac.uk/~plewis/geogg122/python/delnorte.dat

我想提取第 3 列和第 4 列。

我正在使用 np.loadtxt - 收到错误：

ValueError: invalid literal for float(): 2000-01-01

我只对 2005 年感兴趣。如何提取这两列？

score 1 · Accepted Answer

您可以为特定列提供自定义转换函数到loadtxt.
由于您只对年份感兴趣，因此我使用lambda-function 来拆分日期-并将第一部分转换为int：

data = np.loadtxt('delnorte.dat',
         usecols=(2,3),
         converters={2: lambda s: int(s.split('-')[0])},
         skiprows=27)

array([[ 2000.,   190.],
       [ 2000.,   170.],
       [ 2000.,   160.],
       ..., 
       [ 2010.,   185.],
       [ 2010.,   175.],
       [ 2010.,   165.]])

要过滤当年的年份，您可以在 numpy2005中使用逻辑索引：

data_2005 = data[data[:,0] == 2005]

array([[ 2005.,   210.],
       [ 2005.,   190.],
       [ 2005.,   190.],
       [ 2005.,   200.],
        ....])

score 0 · Accepted Answer

我同意使用 csv 模块。我改编了这个答案：在 Python 中读取 scipy/numpy 中的 csv 文件以适用于您的问题。不确定您是否需要 numpy 数组中的数据或列表是否足够。

import numpy as np
import urllib2
import csv

txtFile = csv.reader(open("delnorte.dat.txt", "r"), delimiter='\t')

fields = 5                   
records = [] 
for row, record in enumerate(txtFile):
    if (len(record) != fields or record[0]=='#'):
        pass
        # print "Skipping malformed record or comment: {}, contains {} fields ({} expected)".format(record,len(record),fields)
    else:
        if record[2][0:4] == '2005': 
            # assuming you want columns 3 & 4 with the first column indexed as 0
            records.append([int(record[:][3]), record[:][4]] ) 

# if desired slice the list of lists to put a single column into a numpy array
npData = np.asarray([ npD[0] for npD in records] )

score 0 · Accepted Answer

您不应该使用 NumPy.loadtxt 来读取这些值，而应该使用csv模块来加载文件并读取其数据。

python - 使用 np.loadtxt Python 从 .dat 中提取列

3 回答 3

Related