1

我有一个条目表,如下所示。该表缺少日期条目,我想用月末缺少的日期值填充这些条目。对于第一列,添加的缺失日期条目行应填充相同的值,对于值列,我希望它们填充为 0。

FirstName   MiddleName  LastName    Date    Value1  Value2  Value3
first1  middle1 last1   1/31/2020   51  80  19
first1  middle1 last1   2/29/2020   14  44  56
first1  middle1 last1   4/30/2020   57  96  40
first1  middle1 last1   6/30/2020   58  65  3
first1  middle1 last1   8/31/2020   1   34  4
first1  middle1 last1   10/31/2020  40  38  53
first1  middle1 last1   12/31/2020  93  65  41
first1  middle1 last1   2/28/2021   3   43  0
first1  middle1 last1   4/30/2021   46  61  52
first2  middle2 last2   1/31/2020   64  19  33
first2  middle2 last2   2/29/2020   28  71  16
first2  middle2 last2   4/30/2020   2   94  78
first2  middle2 last2   5/31/2020   78  99  87
first2  middle2 last2   6/30/2020   10  70  14
first2  middle2 last2   7/31/2020   30  30  59
first2  middle2 last2   8/31/2020   55  96  73
first2  middle2 last2   10/31/2020  22  43  23
first2  middle2 last2   11/30/2020  12  4   84
first2  middle2 last2   1/31/2021   59  93  1
first2  middle2 last2   2/28/2021   19  33  52
first2  middle2 last2   3/31/2021   46  12  97
first2  middle2 last2   4/30/2021   41  44  59
first2  middle2 last2   5/31/2021   67  84  96
first2  middle2 last2   6/30/2021   52  69  78
first3  middle3 last3   4/30/2020   5   63  30
first3  middle3 last3   5/31/2020   45  22  7
first3  middle3 last3   6/30/2020   76  2   33
first3  middle3 last3   8/31/2020   81  25  52
first3  middle3 last3   9/30/2020   55  3   32
first3  middle3 last3   11/30/2020  46  45  80
first3  middle3 last3   12/31/2020  17  81  74
first3  middle3 last3   1/31/2021   98  6   55

'''

预期产出

FirstName   MiddleName  LastName    Date    Value1  Value2  Value3
first1  middle1 last1   1/31/2020   51  80  19
first1  middle1 last1   2/29/2020   14  44  56
first1  middle1 last1   3/31/2020   0   0   0
first1  middle1 last1   4/30/2020   57  96  40
first1  middle1 last1   5/31/2020   0   0   0
first1  middle1 last1   6/30/2020   58  65  3
first1  middle1 last1   7/31/2020   0   0   0
first1  middle1 last1   8/31/2020   1   34  4
first1  middle1 last1   9/30/2020   0   0   0
first1  middle1 last1   10/31/2020  40  38  53
first1  middle1 last1   11/30/2020  0   0   0
first1  middle1 last1   12/31/2020  93  65  41
first1  middle1 last1   1/31/2021   0   0   0
first1  middle1 last1   2/28/2021   3   43  0
first1  middle1 last1   3/31/2021   0   0   0
first1  middle1 last1   4/30/2021   46  61  52
first2  middle2 last2   1/31/2020   64  19  33
first2  middle2 last2   2/29/2020   28  71  16
first2  middle2 last2   3/31/2020   0   0   0
first2  middle2 last2   4/30/2020   2   94  78
first2  middle2 last2   5/31/2020   78  99  87
first2  middle2 last2   6/30/2020   10  70  14
first2  middle2 last2   7/31/2020   30  30  59
first2  middle2 last2   8/31/2020   55  96  73
first2  middle2 last2   9/30/2020   0   0   0
first2  middle2 last2   10/31/2020  22  43  23
first2  middle2 last2   11/30/2020  12  4   84
first2  middle2 last2   12/31/2020  0   0   0
first2  middle2 last2   1/31/2021   59  93  1
first2  middle2 last2   2/28/2021   19  33  52
first2  middle2 last2   3/31/2021   46  12  97
first2  middle2 last2   4/30/2021   41  44  59
first2  middle2 last2   5/31/2021   67  84  96
first2  middle2 last2   6/30/2021   52  69  78
first3  middle3 last3   4/30/2020   5   63  30
first3  middle3 last3   5/31/2020   45  22  7
first3  middle3 last3   6/30/2020   76  2   33
first3  middle3 last3   7/31/2020   0   0   0
first3  middle3 last3   8/31/2020   81  25  52
first3  middle3 last3   9/30/2020   55  3   32
first3  middle3 last3   10/31/2020  0   0   0
first3  middle3 last3   11/30/2020  46  45  80
first3  middle3 last3   12/31/2020  17  81  74
first3  middle3 last3   1/31/2021   98  6   55

我尝试将重采样与数据框一起使用,但没有得到所需的输出。df = df.set_index('Date').resample('M').ffill().reset_index()

4

1 回答 1

2

编辑:有一个比下面的第二个更好的解决方案:

你很接近你的逻辑。你可以只包括.groupby()之前.resample.asfreq()之后。

df['Date'] = pd.to_datetime(df['Date'])
cols = ['FirstName','MiddleName','LastName']
df = (df.set_index('Date')
        .groupby(cols)
        .resample('M')
        .asfreq(fill_value=0)
        .drop(cols, axis=1)
        .reset_index())
df
Out[128]: 
   FirstName MiddleName LastName       Date  Value1  Value2  Value3
0     first1    middle1    last1 2020-01-31      51      80      19
1     first1    middle1    last1 2020-02-29      14      44      56
2     first1    middle1    last1 2020-03-31       0       0       0
3     first1    middle1    last1 2020-04-30      57      96      40
4     first1    middle1    last1 2020-05-31       0       0       0
5     first1    middle1    last1 2020-06-30      58      65       3
6     first1    middle1    last1 2020-07-31       0       0       0
7     first1    middle1    last1 2020-08-31       1      34       4
8     first1    middle1    last1 2020-09-30       0       0       0
9     first1    middle1    last1 2020-10-31      40      38      53
10    first1    middle1    last1 2020-11-30       0       0       0
11    first1    middle1    last1 2020-12-31      93      65      41
12    first1    middle1    last1 2021-01-31       0       0       0
13    first1    middle1    last1 2021-02-28       3      43       0
14    first1    middle1    last1 2021-03-31       0       0       0
15    first1    middle1    last1 2021-04-30      46      61      52
16    first2    middle2    last2 2020-01-31      64      19      33
17    first2    middle2    last2 2020-02-29      28      71      16
18    first2    middle2    last2 2020-03-31       0       0       0
19    first2    middle2    last2 2020-04-30       2      94      78
20    first2    middle2    last2 2020-05-31      78      99      87
21    first2    middle2    last2 2020-06-30      10      70      14
22    first2    middle2    last2 2020-07-31      30      30      59
23    first2    middle2    last2 2020-08-31      55      96      73
24    first2    middle2    last2 2020-09-30       0       0       0
25    first2    middle2    last2 2020-10-31      22      43      23
26    first2    middle2    last2 2020-11-30      12       4      84
27    first2    middle2    last2 2020-12-31       0       0       0
28    first2    middle2    last2 2021-01-31      59      93       1
29    first2    middle2    last2 2021-02-28      19      33      52
30    first2    middle2    last2 2021-03-31      46      12      97
31    first2    middle2    last2 2021-04-30      41      44      59
32    first2    middle2    last2 2021-05-31      67      84      96
33    first2    middle2    last2 2021-06-30      52      69      78
34    first3    middle3    last3 2020-04-30       5      63      30
35    first3    middle3    last3 2020-05-31      45      22       7
36    first3    middle3    last3 2020-06-30      76       2      33
37    first3    middle3    last3 2020-07-31       0       0       0
38    first3    middle3    last3 2020-08-31      81      25      52
39    first3    middle3    last3 2020-09-30      55       3      32
40    first3    middle3    last3 2020-10-31       0       0       0
41    first3    middle3    last3 2020-11-30      46      45      80
42    first3    middle3    last3 2020-12-31      17      81      74
43    first3    middle3    last3 2021-01-31      98       6      55

方法#2 reindex(更复杂)

它变得有点复杂,因为您需要在组内填充值。我不确定是否有比以下解决方案更简单的方法:

  1. 您可以创建一个包含缺少的月末日期值的date_range()with 。freq='M'您稍后将使用它来重新索引。
  2. 您正在组内重新编制索引,因此您可以为名为 的组创建唯一标识符FullName。这个新列也将在重新索引时传递,以便您在组内重新索引。
  3. MultiIndex利用上面的 #1 和 #2制作pd.MultiIndex.from_product()
  4. 使用set_index()并设置FirstNameDate作为索引。接下来,您将.reindex 传递我们在#3 中创建的 mmultiindex 的数据框。
  5. 清理数据框,以便使用正确的值填充FirstName,MiddleNameLastName.groupby.transform(max)
  6. 最后,清理列的顺序,不包括我之前创建的临时列FullName.

您需要使用pd.MultiIndex.from_product()和重新索引FirstName一个date_range日期,称为s. 这是它的症结所在,它可以让您填写set_indexreindex填写缺失的日期。

import pandas as pd
df = pd.read_clipboard()
df['Date'] = pd.to_datetime(df['Date'])
s = pd.date_range(df['Date'].min(), df['Date'].max(), freq='M')
df['FullName'] = df['FirstName'] + df['MiddleName'] + df['LastName']
idx = pd.MultiIndex.from_product([df['FullName'].unique(), s], names=['FullName', 'Date'])
df = df.set_index(['FullName', 'Date']).reindex(idx, fill_value='0').reset_index()
df['FirstName'] = df.groupby('FullName')['FirstName'].transform('max')
df['MiddleName'] = df.groupby('FullName')['MiddleName'].transform('max')
df['LastName'] = df.groupby('FullName')['LastName'].transform('max')
df = df[['FirstName', 'MiddleName', 'LastName', 'Date', 'Value1', 'Value2',
       'Value3']].replace('0',0)
df
Out[98]: 
   FirstName MiddleName LastName       Date  Value1  Value2  Value3
0     first1    middle1    last1 2020-01-31      51      80      19
1     first1    middle1    last1 2020-02-29      14      44      56
2     first1    middle1    last1 2020-03-31       0       0       0
3     first1    middle1    last1 2020-04-30      57      96      40
4     first1    middle1    last1 2020-05-31       0       0       0
5     first1    middle1    last1 2020-06-30      58      65       3
6     first1    middle1    last1 2020-07-31       0       0       0
7     first1    middle1    last1 2020-08-31       1      34       4
8     first1    middle1    last1 2020-09-30       0       0       0
9     first1    middle1    last1 2020-10-31      40      38      53
10    first1    middle1    last1 2020-11-30       0       0       0
11    first1    middle1    last1 2020-12-31      93      65      41
12    first1    middle1    last1 2021-01-31       0       0       0
13    first1    middle1    last1 2021-02-28       3      43       0
14    first1    middle1    last1 2021-03-31       0       0       0
15    first1    middle1    last1 2021-04-30      46      61      52
16    first1    middle1    last1 2021-05-31       0       0       0
17    first1    middle1    last1 2021-06-30       0       0       0
18    first2    middle2    last2 2020-01-31      64      19      33
19    first2    middle2    last2 2020-02-29      28      71      16
20    first2    middle2    last2 2020-03-31       0       0       0
21    first2    middle2    last2 2020-04-30       2      94      78
22    first2    middle2    last2 2020-05-31      78      99      87
23    first2    middle2    last2 2020-06-30      10      70      14
24    first2    middle2    last2 2020-07-31      30      30      59
25    first2    middle2    last2 2020-08-31      55      96      73
26    first2    middle2    last2 2020-09-30       0       0       0
27    first2    middle2    last2 2020-10-31      22      43      23
28    first2    middle2    last2 2020-11-30      12       4      84
29    first2    middle2    last2 2020-12-31       0       0       0
30    first2    middle2    last2 2021-01-31      59      93       1
31    first2    middle2    last2 2021-02-28      19      33      52
32    first2    middle2    last2 2021-03-31      46      12      97
33    first2    middle2    last2 2021-04-30      41      44      59
34    first2    middle2    last2 2021-05-31      67      84      96
35    first2    middle2    last2 2021-06-30      52      69      78
36    first3    middle3    last3 2020-01-31       0       0       0
37    first3    middle3    last3 2020-02-29       0       0       0
38    first3    middle3    last3 2020-03-31       0       0       0
39    first3    middle3    last3 2020-04-30       5      63      30
40    first3    middle3    last3 2020-05-31      45      22       7
41    first3    middle3    last3 2020-06-30      76       2      33
42    first3    middle3    last3 2020-07-31       0       0       0
43    first3    middle3    last3 2020-08-31      81      25      52
44    first3    middle3    last3 2020-09-30      55       3      32
45    first3    middle3    last3 2020-10-31       0       0       0
46    first3    middle3    last3 2020-11-30      46      45      80
47    first3    middle3    last3 2020-12-31      17      81      74
48    first3    middle3    last3 2021-01-31      98       6      55
49    first3    middle3    last3 2021-02-28       0       0       0
50    first3    middle3    last3 2021-03-31       0       0       0
51    first3    middle3    last3 2021-04-30       0       0       0
52    first3    middle3    last3 2021-05-31       0       0       0
53    first3    middle3    last3 2021-06-30       0       0       0
于 2020-09-03T05:09:10.723 回答