2

从下面的 2 个DBKEY相同的唯一数据帧中STATION,我需要创建一个具有 2 个相同的单独VAL列 (VAL1VAL2)的新数据帧STATION

    DBKEY STATION DAILY_DATE VAL   
0 T9947 G377C_C 2011-10-01 00:00:00 17.123   
1 T9947 G377C_C 2011-10-02 00:00:00 NaN  
2 T9947 G377C_C 2011-10-03 00:00:00 NaN  
3 T9947 G377C_C 2011-10-04 00:00:00 NaN  
4 T9947 G377C_C 2011-10-05 00:00:00 NaN   
5 T9947 G377C_C 2011-10-06 00:00:00 NaN   
6 T9947 G377C_C 2011-10-07 00:00:00 NaN     
7 T9947 G377C_C 2011-10-08 00:00:00 NaN     
8 T9947 G377C_C 2011-10-09 00:00:00 92.734   
9 T9947 G377C_C 2011-10-10 00:00:00 48.975   
10 T9947 G377C_C 2011-10-11 00:00:00 17.463   
11 T9947 G377C_C 2011-10-12 00:00:00 南  
12 T9947 G377C_C 2011-10-13 00:00:00 NaN   
13 T9947 G377C_C 2011-10-14 00:00:00 12.870   
14 T9947 G377C_C 2011-10-15 00:00:00 NaN    
15 T9947 G377C_C 2011-10-16 00:00:00 48.138   
16 T9947 G377C_C 2011-10-17 00:00:00 0.413   
17 T9947 G377C_C 2011-10-18 00:00:00 39.058  
18 T9947 G377C_C 2011-10-19 00:00:00 235.617  
19 T9947 G377C_C 2011-10-20 00:00:00 182.989  
20 T9947 G377C_C 2011-10-21 00:00:00 132.193  
21 T9947 G377C_C 2011-10-22 00:00:00 19.557   
22 T9947 G377C_C 2011-10-23 00:00:00 NaN   
23 T9947 G377C_C 2011-10-24 00:00:00 80.552  
24 T9947 G377C_C 2011-10-25 00:00:00 NaN   
25 T9947 G377C_C 2011-10-26 00:00:00 NaN   
26 T9947 G377C_C 2011-10-27 00:00:00 39.258   
27 T9947 G377C_C 2011-10-28 00:00:00 NaN    
28 T9947 G377C_C 2011-10-29 00:00:00 253.969  
29 T9947 G377C_C 2011-10-30 00:00:00 319.685  
30 T9947 G377C_C 2011-10-31 00:00:00 303.855  
31 W3972 G377C_C 2011-10-01 00:00:00 17.120   
32 W3972 G377C_C 2011-10-02 00:00:00 NaN    
33 W3972 G377C_C 2011-10-03 00:00:00 NaN   
34 W3972 G377C_C 2011-10-04 00:00:00 NaN    
35 W3972 G377C_C 2011-10-05 00:00:00 NaN    
36 W3972 G377C_C 2011-10-06 00:00:00 NaN    
37 W3972 G377C_C 2011-10-07 00:00:00 NaN    
38 W3972 G377C_C 2011-10-08 00:00:00 NaN    
39 W3972 G377C_C 2011-10-09 00:00:00 92.730  
40 W3972 G377C_C 2011-10-10 00:00:00 48.980  
41 W3972 G377C_C 2011-10-11 00:00:00 17.460   
42 W3972 G377C_C 2011-10-12 00:00:00 NaN    
43 W3972 G377C_C 2011-10-13 00:00:00 NaN    
44 W3972 G377C_C 2011-10-14 00:00:00 12.870   
45 W3972 G377C_C 2011-10-15 00:00:00 NaN    
46 W3972 G377C_C 2011-10-16 00:00:00 48.140   
47 W3972 G377C_C 2011-10-17 00:00:00 0.410   
48 W3972 G377C_C 2011-10-18 00:00:00 39.060   
49 W3972 G377C_C 2011-10-19 00:00:00 235.620  
50 W3972 G377C_C 2011-10-20 00:00:00 182.990  
51 W3972 G377C_C 2011-10-21 00:00:00 132.190  
52 W3972 G377C_C 2011-10-22 00:00:00 19.560   
53 W3972 G377C_C 2011-10-23 00:00:00 NaN  
54 W3972 G377C_C 2011-10-24 00:00:00 80.550   
55 W3972 G377C_C 2011-10-25 00:00:00 南   
56 W3972 G377C_C 2011-10-26 00:00:00 NaN    
57 W3972 G377C_C 2011-10-27 00:00:00 39.260   
58 W3972 G377C_C 2011-10-28 00:00:00 NaN    
59 W3972 G377C_C 2011-10-29 00:00:00 253.970  
60 W3972 G377C_C 2011-10-30 00:00:00 319.690  
61 W3972 G377C_C 2011-10-31 00:00:00 303.860  

所以,我需要结果只有 31 行,带有STATIONVAL1(第一组DBKEYs)和VAL2(第二组DBKEYs)。

STATION     DAILY_DATE  VAL1      VAL2
G377C_C     10/1/2011   17.123    17.12
G377C_C     10/2/2011   NaN   NaN
G377C_C     10/3/2011   NaN   NaN
G377C_C     10/4/2011   NaN   NaN
G377C_C     10/5/2011   NaN   NaN
G377C_C     10/6/2011   NaN   NaN
G377C_C     10/7/2011   NaN   NaN
G377C_C     10/8/2011   NaN   NaN
G377C_C     10/9/2011   92.734    92.73
G377C_C     10/10/2011  48.975    48.98
G377C_C     10/11/2011  17.463    17.46
G377C_C     10/12/2011  NaN   NaN
G377C_C     10/13/2011  NaN   NaN
G377C_C     10/14/2011  12.87     12.87
G377C_C     10/15/2011  NaN   NaN
G377C_C     10/16/2011  48.138    48.14
G377C_C     10/17/2011  0.413     0.41
G377C_C     10/18/2011  39.058    39.06
G377C_C     10/19/2011  235.617   235.62
G377C_C     10/20/2011  182.989   182.99
G377C_C     10/21/2011  132.193   132.19
G377C_C     10/22/2011  19.557    19.56
G377C_C     10/23/2011  NaN   NaN
G377C_C     10/24/2011  80.552    80.55
G377C_C     10/25/2011  NaN   NaN
G377C_C     10/26/2011  NaN   NaN
G377C_C     10/27/2011  39.258    39.26
G377C_C     10/28/2011  NaN   NaN
G377C_C     10/29/2011  253.969   253.97
G377C_C     10/30/2011  319.685   319.69
G377C_C     10/31/2011  303.855   303.86
4

1 回答 1

2

如果我理解正确,这似乎很简单。 unstack()应该照顾它:

In [2]: df = DataFrame({"DBKEY":['T9947', 'T9947', 'T9947', 'W3972','W3972','W3972'],"STATION":['G377C_C','G377C_C','G377C_C','G377C_C','G377C_C','G377C_C'],"DAILY_DATE":['2011-10-01 00:00:00','2011-10-02 00:00:00','2011-10-03 00:00:00','2011-10-01 00:00:00','2011-10-02 00:00:00','2011-10-03 00:00:00'],"VAL":[ 17.123, 'NaN', 'NaN', '17.120', 'NaN', 'NaN']})
In [3]: df
Out[3]:
            DAILY_DATE  DBKEY  STATION     VAL
0  2011-10-01 00:00:00  T9947  G377C_C  17.123
1  2011-10-02 00:00:00  T9947  G377C_C     NaN
2  2011-10-03 00:00:00  T9947  G377C_C     NaN
3  2011-10-01 00:00:00  W3972  G377C_C  17.120
4  2011-10-02 00:00:00  W3972  G377C_C     NaN
5  2011-10-03 00:00:00  W3972  G377C_C     NaN

In [4]: df2 = df.set_index(["STATION", "DBKEY", "DAILY_DATE"])
In [5]: df2
Out[5]:
                                      VAL
STATION DBKEY DAILY_DATE                 
G377C_C T9947 2011-10-01 00:00:00  17.123
              2011-10-02 00:00:00     NaN
              2011-10-03 00:00:00     NaN
        W3972 2011-10-01 00:00:00  17.120
              2011-10-02 00:00:00     NaN
              2011-10-03 00:00:00     NaN

In [6]: df3 = df2.unstack(level=1)
In [7]: df3
Out[7]: 
                                VAL        
DBKEY                         T9947   W3972
STATION DAILY_DATE                         
G377C_C 2011-10-01 00:00:00  17.123  17.120
        2011-10-02 00:00:00     NaN     NaN
        2011-10-03 00:00:00     NaN     NaN
于 2013-04-26T17:52:33.560 回答