python - 匹配可变列号所需的循环参数的可变数量的numpy数组

Question

我正在使用 csv 文件中的内容填充一个 numpy 数组。CSV 文件中的列数可能会发生变化。我正在尝试将前两个字符串列（日期 + 时间）连接到一个日期对象中，我在 stackoverflow 上找到了一个示例。但是，此示例将要求我在每次更改列数时更改脚本。

这是示例：

#! /usr/bin/python
# variable number of numpy array for loop arguments, but only care about the first two 

import numpy as np
import csv
import os
import datetime as datetime

# simulate a csv file
from StringIO import StringIO
data = StringIO("""
Title
Date,Time,Speed
,,(m/s)
2012-04-01,00:10, 85
2012-04-02,00:20, 86
2012-04-03,00:30, 87
""".strip())

next(data)  # eat away the first line, which is the title
header = [item.strip() for item in next(data).split(',')] # get the headers
#print header
arr = np.genfromtxt(data, delimiter=',', skiprows=1, dtype=None) #skip the unit rows
arr.dtype.names = header # assign the header to names. so we can use it to do indexing

y1 = arr['Speed']   # column headings were assigned previously by arr.dtype.names = header

# Here is an example from:
# https://stackoverflow.com/questions/7500864/python-array-of-datetime-objects-from-numpy-ndarray

date_objects = np.array([datetime.datetime.strptime(a + b, "%Y-%m-%d%H:%M") 
                        for a,b,c in arr])
print date_objects

问题：在上面的 for 语句中，它接受了一个 numpy 数组。现在，我指定 a,b,c 因为我有三列，但是如果我添加了第四列，那么这个语句会中断 ValueError: too many values to unpack, which is not very reboust If I only care about在这种情况下，前两列 a 和 b，我该如何重写？有没有办法在 arr 中说 a,b,...？

我已经尝试将 arr 拼接到前两列。

# Note1: Splice fails with index error too many indices
#arr_date_time = arr[:,:2]

拼接错误的解决方法是设置 dtype=object 而不是设置 dtype.names，但我想设置 dtype.names，因为它使索引列更具可读性。请参阅我的相关帖子Numpy set dtype=None, cannot splice columns and set dtype=object cannot set dtype.names

score 1 · Accepted Answer

尝试这个：

date_objects = np.array([datetime.datetime.strptime(row[0] + row[1], "%Y-%m-%d%H:%M") 
                    for row in arr])

python - 匹配可变列号所需的循环参数的可变数量的numpy数组

1 回答 1

Related

Reference