1

我正在寻找一种方法来确定列中的时间是否在同一列中另一个日期的 7 天内。

说这是我的数据框-

dic = {'firstname':['Rick','Rick','Rick','John','John','John','David',
                    'David','David','Steve','Steve','Steve','Jim','Jim',
                    'Jim'],
       'lastname':['Smith','Smith','Smith','Jones','Jones','Jones',
                   'Wilson','Wilson','Wilson','Johnson','Johnson',
                   'Johnson','Miller','Miller','Miller'],
       'company':['CFA','CFA','CFA','WND','WND','WND','INO','INO','INO',
                  'CHP','CHP','CHP','MCD','MCD','MCD'],
       'faveday':['2020-03-16','2020-03-11','2020-03-25','2020-04-30',
                  '2020-05-22','2020-05-03','2020-01-31','2020-01-13',
                  '2020-01-10','2020-10-22','2020-10-28','2020-10-22',
                  '2020-10-13','2020-10-28','2020-10-20']}
df = pd.DataFrame(dic)
df['faveday'] = pd.to_datetime(df['faveday'])
print(df)

带输出-

   firstname lastname company    faveday
0       Rick    Smith     CFA 2020-03-16
1       Rick    Smith     CFA 2020-03-11
2       Rick    Smith     CFA 2020-03-25
3       John    Jones     WND 2020-04-30
4       John    Jones     WND 2020-05-22
5       John    Jones     WND 2020-05-03
6      David   Wilson     INO 2020-01-31
7      David   Wilson     INO 2020-01-13
8      David   Wilson     INO 2020-01-10
9      Steve  Johnson     CHP 2020-10-22
10     Steve  Johnson     CHP 2020-10-28
11     Steve  Johnson     CHP 2020-10-22
12       Jim   Miller     MCD 2020-10-13
13       Jim   Miller     MCD 2020-10-28
14       Jim   Miller     MCD 2020-10-20

然后我对数据进行排序 -

df = df.sort_values(['firstname','lastname','company','faveday'])
print(df)

要得到-

   firstname lastname company    faveday
8      David   Wilson     INO 2020-01-10
7      David   Wilson     INO 2020-01-13
6      David   Wilson     INO 2020-01-31
12       Jim   Miller     MCD 2020-10-13
14       Jim   Miller     MCD 2020-10-20
13       Jim   Miller     MCD 2020-10-28
3       John    Jones     WND 2020-04-30
5       John    Jones     WND 2020-05-03
4       John    Jones     WND 2020-05-22
1       Rick    Smith     CFA 2020-03-11
0       Rick    Smith     CFA 2020-03-16
2       Rick    Smith     CFA 2020-03-25
9      Steve  Johnson     CHP 2020-10-22
11     Steve  Johnson     CHP 2020-10-22
10     Steve  Johnson     CHP 2020-10-28

假设我想按当前顺序(索引 8,然后是 7、6、12 等)知道某个日期是否在另一个日期的 7 天内。(所以索引 8 和 7 都会产生 true 但索引 6 不会)

但我也想按名称分组。(所以指数 12 和 14 在 Jim Miller 组中为真,而 13 在 Jim Miller 组中不成立,但在 Steve Johnson 组中指数 9、11 和 10 都为真)

有没有办法减去每个组中的日期,然后创建一个列来表示 TRUE 或 FALSE,这取决于它是否在另一天的 7 天内?

我正在寻找这样的输出-

   firstname lastname company    faveday seven_days
8      David   Wilson     INO 2020-01-10       TRUE
7      David   Wilson     INO 2020-01-13       TRUE
6      David   Wilson     INO 2020-01-31      FALSE
12       Jim   Miller     MCD 2020-10-13       TRUE
14       Jim   Miller     MCD 2020-10-20       TRUE
13       Jim   Miller     MCD 2020-10-28      FALSE
3       John    Jones     WND 2020-04-30       TRUE
5       John    Jones     WND 2020-05-03       TRUE
4       John    Jones     WND 2020-05-22      FALSE
1       Rick    Smith     CFA 2020-03-11       TRUE
0       Rick    Smith     CFA 2020-03-16       TRUE
2       Rick    Smith     CFA 2020-03-25      FALSE
9      Steve  Johnson     CHP 2020-10-22       TRUE
11     Steve  Johnson     CHP 2020-10-22       TRUE
10     Steve  Johnson     CHP 2020-10-28       TRUE
4

2 回答 2

1

你可以试试这个。

from datetime import timedelta

m = (df.groupby(['firstname','lastname']).
        apply(lambda x: x['faveday'].sub(x['faveday'].shift()).bfill()).
        reset_index(level=[0,1],drop=True))
df['seven_days'] = m.le(timedelta(days=7))

   firstname lastname company    faveday  seven_days
8      David   Wilson     INO 2020-01-10        True
7      David   Wilson     INO 2020-01-13        True
6      David   Wilson     INO 2020-01-31       False
12       Jim   Miller     MCD 2020-10-13        True
14       Jim   Miller     MCD 2020-10-20        True
13       Jim   Miller     MCD 2020-10-28       False
3       John    Jones     WND 2020-04-30        True
5       John    Jones     WND 2020-05-03        True
4       John    Jones     WND 2020-05-22       False
1       Rick    Smith     CFA 2020-03-11        True
0       Rick    Smith     CFA 2020-03-16        True
2       Rick    Smith     CFA 2020-03-25       False
9      Steve  Johnson     CHP 2020-10-22        True
11     Steve  Johnson     CHP 2020-10-22        True
10     Steve  Johnson     CHP 2020-10-28        True
于 2020-06-08T18:52:32.267 回答
1

让我们尝试用numpy广播自定义一个函数

def sefd (x): 
    return np.sum((np.abs(x.values-x.values[:,None])/np.timedelta64(1, 'D'))<=7,axis=1)>=2
s=df.groupby(['firstname', 'lastname', 'company'])['faveday'].transform(sefd)
Out[301]: 
0      True
1      True
2     False
3      True
4     False
5      True
6     False
7      True
8      True
9      True
10     True
11     True
12     True
13    False
14     True
Name: faveday, dtype: bool
df['seven_days']=s
于 2020-06-08T18:48:05.620 回答