python - 熊猫系列重采样

Question

我有以下熊猫系列：

    dummy_array = pd.Series(np.array(range(-10, 11)), index=(np.array(range(0, 21))/10))

这产生以下数组：

如果我想重新采样，我该怎么做？我阅读了文档，它建议这样做：

    dummy_array.resample('20S').mean()

但它不起作用。有任何想法吗？

谢谢你。

编辑：

我希望我的最终向量具有双倍的频率。所以是这样的：

0.0   -10
0.05   -9.5
0.1    -9
0.15    -8.5
0.2    -8
0.25    -7.5
etc.

score 2 · Accepted Answer

这是使用np.linspace(),.reindex()和的解决方案interpolate：

如上所述创建数据框dummmy_array。

# get properties of original index
start = dummy_array.index.min()
end = dummy_array.index.max()
num_gridpoints_orig = dummy_array.index.size

# calc number of grid-points in new index
num_gridpoints_new = (num_gridpoints_orig  * 2) - 1 

# create new index, with twice the number of grid-points (i.e., smaller step-size)
idx_new = np.linspace(start, end, num_gridpoints_new)

# re-index the data frame.  New grid-points have value of NaN,
# and we replace these NaNs with interpolated values
df2 = dummy_array.reindex(index=idx_new).interpolate()

print(df2.head())

0.00   -10.0
0.05    -9.5
0.10    -9.0
0.15    -8.5
0.20    -8.0

score 0 · Accepted Answer

根据原始数组创建差异列表。然后我们将其分解为值和索引以创建“pd.Series”。加入新的 pd.series 并重新排序。

# new list
ups = [[x+0.05,y+0.5] for x,y in zip(dummy_array.index, dummy_array)]
idx = [i[0] for i in ups]
val = [i[1] for i in ups]
d2 = pd.Series(val, index=idx)
d3 = pd.concat([dummy_array,d2], axis=0)
d3.sort_values(inplace=True)

d3
0.00   -10.0
0.05    -9.5
0.10    -9.0
0.15    -8.5
0.20    -8.0
0.25    -7.5
0.30    -7.0
0.35    -6.5
0.40    -6.0
0.45    -5.5
0.50    -5.0
0.55    -4.5
0.60    -4.0
0.65    -3.5
0.70    -3.0
0.75    -2.5
0.80    -2.0
0.85    -1.5
0.90    -1.0
0.95    -0.5
1.00     0.0
1.05     0.5
1.10     1.0
1.15     1.5
1.20     2.0
1.25     2.5
1.30     3.0
1.35     3.5
1.40     4.0
1.45     4.5
1.50     5.0
1.55     5.5
1.60     6.0
1.65     6.5
1.70     7.0
1.75     7.5
1.80     8.0
1.85     8.5
1.90     9.0
1.95     9.5
2.00    10.0
2.05    10.5
dtype: float64

score 0 · Accepted Answer

谢谢大家的贡献。在查看了答案并进行了更多思考之后，我发现了一个更通用的解决方案，可以处理所有可能的情况。在这种情况下，我想将 dummy_arrayA 上采样到与 dummy_arrayB 相同的索引。我所做的是创建一个同时具有 A 和 B 的新索引。然后我使用 reindex 和 interpolate 函数来计算新值，最后我删除旧索引，以便获得相同的数组大小作为 dummy_array-B。

import pandas as pd
import numpy as np

# Create Dummy arrays
dummy_arrayA = pd.Series(np.array(range(0, 4)), index=[0,0.5,1.0,1.5])
dummy_arrayB = pd.Series(np.array(range(0, 5)), index=[0,0.4,0.8,1.2,1.6])

# Create new index based on array A
new_ind = pd.Index(dummy_arrayA.index)
# merge index A and B
new_ind=new_ind.union(dummy_arrayB.index)

# Use the reindex function. This will copy all the values and add the missing ones with nan. Then we call the interpolate function with the index method. So that it's interpolates based on the time.
df2 = dummy_arrayA.reindex(index=new_ind).interpolate(method="index")

# Delete the points.
New_ind_inter = dummy_arrayA.index.intersection(new_ind)
# We need to prevent that common point are also deleted.
new_ind = new_ind.difference(New_ind_inter)

# Delete the old points. So that the final array matchs dummy_arrayB
df2 = df2.drop(new_ind)

print(df2)

python - 熊猫系列重采样

3 回答 3

Related

Reference