python - 对 numpy 数组切片进行采样的最快方法是什么？

Question

我有一个 3D (time, X, Y) numpy 数组，其中包含几年的 6 小时时间序列。（比如说5）。我想创建一个采样时间序列，其中包含从可用记录中随机抽取的每个日历日的 1 个实例（每天 5 种可能性），如下所示。

2006 年 1 月 1 日
2011 年 1 月 2 日
2009 年 1 月 3 日
...

这意味着我需要从 01/01/2006 中获取 4 个值，从 02/01/2011 中获取 4 个值，等等。我有一个工作版本，其工作方式如下：

重塑输入数组以添加“年份”维度（时间、年份、X、Y）
创建一个由 0 到 4 之间随机生成的整数组成的 365 值数组
使用 np.repeat 和整数数组仅提取相关值：

例子：

sampledValues = Variable[np.arange(numberOfDays * ValuesPerDays), sampledYears.repeat(ValuesPerDays),:,:]

这似乎可行，但我想知道这是否是解决我的问题的最佳/最快方法？速度很重要，因为我在循环中执行此操作，adn 将从测试尽可能多的案例中受益。

我这样做对吗？

谢谢

编辑我忘了提到我过滤了输入数据集以删除闰年的 2 月 29 日。

基本上，该操作的目的是找到一个 365 天的样本，该样本在均值等方面与长期时间序列非常匹配。如果采样的时间序列通过了我的质量测试，我想将其导出并重新开始。

score 3 · Accepted Answer

2008 年是 366 天，所以不要重塑。

看看scikits.timeseries：

import scikits.timeseries as ts

start_date = ts.Date('H', '2006-01-01 00:00')
end_date = ts.Date('H', '2010-12-31 18:00')
arr3d = ... # your 3D array [time, X, Y]

dates = ts.date_array(start_date=start_date, end_date=end_date, freq='H')[::6]
t = ts.time_series(arr3d, dates=dates)
# just make sure arr3d.shape[0] == len(dates) !

现在您可以使用t日/月/年对象访问数据：

t[np.logical_and(t.day == 1, t.month == 1)]

例如：

for day_of_year in xrange(1, 366):
    year = np.random.randint(2006, 2011)

    t[np.logical_and(t.day_of_year == day_of_year, t.year == year)]
    # returns a [4, X, Y] array with data from that day

使用的属性t使其也适用于闰年。

score 0 · Accepted Answer

我认为没有真正需要重塑数组，因为您可以在采样过程中嵌入年份大小的信息，并使数组保持其原始形状。

例如，您可以生成一个随机偏移量（从 0 到 365），然后选择索引为的切片n*365 + offset。

无论如何，我不认为你的问题是完整的，因为我不太明白你需要做什么，或者为什么。

python - 对 numpy 数组切片进行采样的最快方法是什么？

2 回答 2

Related

Reference