0

我正在使用 csv 文件中的内容填充一个 numpy 数组。CSV 文件中的列数可能会发生变化。我正在尝试将前两个字符串列(日期 + 时间)连接到一个日期对象中,我在 stackoverflow 上找到了一个示例。但是,此示例将要求我在每次更改列数时更改脚本。

这是示例:

#! /usr/bin/python
# variable number of numpy array for loop arguments, but only care about the first two 

import numpy as np
import csv
import os
import datetime as datetime

# simulate a csv file
from StringIO import StringIO
data = StringIO("""
Title
Date,Time,Speed
,,(m/s)
2012-04-01,00:10, 85
2012-04-02,00:20, 86
2012-04-03,00:30, 87
""".strip())

next(data)  # eat away the first line, which is the title
header = [item.strip() for item in next(data).split(',')] # get the headers
#print header
arr = np.genfromtxt(data, delimiter=',', skiprows=1, dtype=None) #skip the unit rows
arr.dtype.names = header # assign the header to names. so we can use it to do indexing

y1 = arr['Speed']   # column headings were assigned previously by arr.dtype.names = header

# Here is an example from:
# https://stackoverflow.com/questions/7500864/python-array-of-datetime-objects-from-numpy-ndarray

date_objects = np.array([datetime.datetime.strptime(a + b, "%Y-%m-%d%H:%M") 
                        for a,b,c in arr])
print date_objects

问题:在上面的 for 语句中,它接受了一个 numpy 数组。现在,我指定 a,b,c 因为我有三列,但是如果我添加了第四列,那么这个语句会中断 ValueError: too many values to unpack, which is not very reboust If I only care about在这种情况下,前两列 a 和 b,我该如何重写?有没有办法在 arr 中说 a,b,...?

我已经尝试将 arr 拼接到前两列。

# Note1: Splice fails with index error too many indices
#arr_date_time = arr[:,:2]

拼接错误的解决方法是设置 dtype=object 而不是设置 dtype.names,但我想设置 dtype.names,因为它使索引列更具可读性。请参阅我的相关帖子Numpy set dtype=None, cannot splice columns and set dtype=object cannot set dtype.names

4

1 回答 1

1

尝试这个:

date_objects = np.array([datetime.datetime.strptime(row[0] + row[1], "%Y-%m-%d%H:%M") 
                    for row in arr])
于 2013-07-19T06:58:18.777 回答