python - Python按年+月拆分日期时间列表

Question

我有以下 csv 文件：

# simulate a csv file
from StringIO import StringIO
data = StringIO("""
2012-04-01,00:10, A, 10
2012-04-01,00:20, B, 11
2012-04-01,00:30, B, 12
2012-04-02,00:10, A, 18
2012-05-02,00:20, A, 14
2012-05-02,00:30, B, 11
2012-05-03,00:10, A, 10
2012-06-03,00:20, B, 13
2012-06-03,00:30, C, 12
""".strip())

我想按年+月加类别（即A、B、C）来分类。

我希望最终数据按月分组，然后按类别分组，作为原始数据的视图

2012-04, A

>>  array[0,] => 2012-04-01,00:10, A, 10

>>  array[3,] => 2012-04-02,00:10, A, 18

2012-04, B

>>  array[1,] => 2012-04-01,00:20, B, 11

>>  array[2,] => 2012-04-01,00:30, B, 12

2012-05, A

>>  array[4,] => 2012-05-02,00:20, A, 14

...

然后对于每个组，我想迭代以使用相同的函数绘制它们。

我已经看到了一个类似的问题，关于按天拆分日期时间的列表拆分为天，在我的情况下我可以这样做 a)。但是在 b) 的情况下，有一些问题将其变成了年+月的拆分。

这是到目前为止我遇到的问题的片段：

#! /usr/bin/python

import numpy as np
import csv
import os
from  datetime import datetime

def strToDate(string):
    d = datetime.strptime(string, '%Y-%m-%d')
    return d;

def strToMonthDate(string):
    d = datetime.strptime(string, '%Y-%m-%d')
    d_by_month = datetime(d.year,d.month,1)
    return d_by_month;

# simulate a csv file
from StringIO import StringIO
data = StringIO("""
2012-04-01,00:10, A, 10
2012-04-01,00:20, B, 11
2012-04-01,00:30, B, 12
2012-04-02,00:10, A, 18
2012-05-02,00:20, A, 14
2012-05-02,00:30, B, 11
2012-05-03,00:10, A, 10
2012-06-03,00:20, B, 13
2012-06-03,00:30, C, 12
""".strip())

arr = np.genfromtxt(data, delimiter=',', dtype=object)


# a) If we were to just group by dates
# Get unique dates
#keys = np.unique(arr[:,0])
#keys1 = np.unique(arr[:,2])
# Group by unique dates
#for key in keys:
#   print key   
#   for key1 in keys1:      
#       group = arr[ (arr[:,0]==key) & (arr[:,2]==key1) ]                       
#       if group.size:
#           print "\t" + key1
#           print group
#   print "\n"      

# b) But if we want to group by year+month in the dates 
dates_by_month = np.array(map(strToMonthDate, arr[:,0]))
keys2 = np.unique(dates_by_month)
print dates_by_month
# >> [datetime.datetime(2012, 4, 1, 0, 0), datetime.datetime(2012, 4, 1, 0, 0), ...
print "\n"  
print keys2
# >> [2012-04-01 00:00:00 2012-05-01 00:00:00 2012-06-01 00:00:00]

for key in keys2:
    print key       
     print type(key)
    group = arr[dates_by_month==key]
        print group
    print "\n"

问题：我得到了每月密钥，但对于组，我得到的只是每个组的 [2012-04-01 00:10 A 10]。keys2 中的键是 datetime.datetime 类型。知道有什么问题吗？欢迎任何替代实施建议。我不想使用 itertools.groupby 解决方案，因为它返回一个迭代器而不是一个数组，这不太适合绘图。

编辑1：问题已解决。问题是我在案例 b) 中使用的提前索引的 dates_by_month 应该初始化为 np.array 而不是 map 返回 dates_by_month = np.array(map(strToMonthDate, arr[:,0])) 的列表。我已经在上面的代码片段中修复了它，现在这个例子可以工作了。

score 4 · Accepted Answer

我发现问题出在我原来的解决方案中。

在情况 b) 中，

dates_by_month = map(strToMonthDate, arr[:,0])

返回一个列表而不是一个 numpy 数组。提前索引：

group = arr[dates_by_month==key]

因此行不通。如果相反，我有：

dates_by_month = np.array(map(strToMonthDate, arr[:,0]))

然后分组按预期工作。

python - Python按年+月拆分日期时间列表

1 回答 1

Related

Reference