3

数据框:

col1  col_entity col2
a        a1       50
b        b1       40
a        a2       40
a        a3       30
b        b2       20
a        a4       20
b        b3       30
b        b4       50

我需要根据 col1 对它们进行分组,并根据每个组的 col2 将它们从高到低排序,并找到连续行之间的差异,然后为字符串语句的不同组创建列。日期范围:

col1  col_entity col2   diff   col_statement
a        a1       50     10     difference between a1 and a2 is 10
b        a2       40     10     difference between a2 and a3 is 10
a        a3       30     10     difference between a3 and a4 is 10
a        a4       20     nan    **will drop this row**
b        b1       40     10     difference between b1 and b4 is 10
a        b4       50     10     difference between b4 and b3 is 10
b        b3       30     10     difference between b3 and b2 is 10
b        b2       20     nan    **will drop this row**

请帮助我提前谢谢

4

1 回答 1

0

你可以做几个 np.where 语句:

  1. 使用diff().abs()获取一行与下一行之间的绝对差.shift()
  2. 如果提取的字母字符在一行和下一行之间不匹配,则NaN返回。.dif()
  3. col_statement列中,根据其他列有条件地根据以下NaN值构建一个字符串np.where()

df['diff'] = np.where(df['col1'].str.extract('([a-z])') == df['col1'].shift(-1).str.extract('([a-z])'),
                      df['col_entity col2'].diff().abs().shift(-1), np.nan)
df['col_statement'] = np.where(df['diff'].isnull(),
                               '**will drop this row**',
                              'difference between' + ' ' + df['col1'] + ' and '
                                   + df['col1'].shift(-1) + ' is ' + df['diff'].astype(str))
df
Out[1]: 
  col1  col_entity col2  diff                         col_statement
a   a1               50  10.0  difference between a1 and a2 is 10.0
b   a2               40  10.0  difference between a2 and a3 is 10.0
a   a3               30  10.0  difference between a3 and a4 is 10.0
a   a4               20   NaN                **will drop this row**
b   b1               40  10.0  difference between b1 and b4 is 10.0
a   b4               50  10.0  difference between b4 and b3 is 10.0
b   b3               30  10.0  difference between b3 and b2 is 10.0
b   b2               20   NaN                **will drop this row**
于 2020-10-09T09:58:24.813 回答