0

我的数据框如下所示:

         ID             date         var1         var2
0        1100289299522  2020-12-01   109.046450   8.0125
1        1100289299522  2020-12-02   104.494946   6.1500
2        1100289299522  2020-12-03   117.011582   5.9375
3        1100289299522  2020-12-04   109.615388   5.4750
4        1100289299522  2020-12-05   142.803438   3.8500
...                ...         ...          ...      ...
960045  21380318319578  2021-05-27     7.524261  15.4875
960046  21380318319578  2021-05-28     3.256770  17.3625
960047  21380318319578  2021-05-29     0.561512  18.3250
960048  21380318319578  2021-05-30     1.347629  18.7625
960049  21380318319578  2021-05-31     0.112302  20.0750

pandas 中是否有一种简单的方法可以让每行有一个 ID 并设置如下所示的列:

ID             2020-12-01_var1  2020-12-02_var1 ...  2021-05-31_var1  2020-12-01_var2  2020-12-02_var2 ...  2021-05-31 _var2
1100289299522  109.046450       104.494946      ...  ___              8.0125           6.1500          ...  ___

然后我可以使用降维算法(如 TSNE)并可能对每个时间序列(和 ID)进行分类。

你认为这是正确的方法吗?

4

1 回答 1

1

尝试:

out = df.pivot(index='ID', columns='date', values=['var1', 'var2'])
out.columns = out.columns.to_flat_index().str.join('_')

对于您的样品:

>>> out
                var1_2020-12-01  var1_2020-12-02  var1_2020-12-03  ...  var2_2021-05-29  var2_2021-05-30  var2_2021-05-31
ID                                                                 ...
1100289299522         109.04645       104.494946       117.011582  ...              NaN              NaN              NaN
21380318319578              NaN              NaN              NaN  ...           18.325          18.7625           20.075

[2 rows x 20 columns]
于 2021-07-27T08:08:29.567 回答