1


我在用

df_topics = df.groupby(['string1','theme']).count().reset_index().first()

给我这个错误

TypeError: first() missing 1 required positional argument: 'offset'

我只是想按组计算重复项并想选择非空第一行
这就是我使用 first() 的原因。这给了我非空的第一行。

数据框

string1      theme      type    tool    site 
houses       white      A       phone           
houses       black      B               cloud
houses       white      A               web
houses       white      A       phone   web

输出

string1      theme      Type    tool    site   count
houses       white      A       phone   web    3
houses       black      B               cloud  1

我的主要重点是计算 string1 但我也想选择行以显示在最终输出中是具有较小非空值的行。

如何解决这个问题?

4

1 回答 1

3

string1您可以在不使用函数的情况下创建列字典,first并添加countfor string1、 pass toGroupBy.agg和 last rename column:

d = dict.fromkeys(df.columns.difference(['string1','theme']), 'first')
d['string1'] = 'count'
df_topics = (df.groupby(['string1','theme'], sort=False)
               .agg(d)
               .rename(columns={'string1':'count'})
               .reset_index())
print (df_topics)

  string1  theme   site   tool type  count
0  houses  white    web  phone    A      3
1  houses  black  cloud    NaN    B      1

详情

print (d)
{'site': 'first', 'tool': 'first', 'type': 'first', 'string1': 'count'}

或者使用命名聚合:

df_topics = (df.groupby(['string1','theme'], sort=False)
              .agg(type=('type','first'),
                   tool=('tool','first'),
                   site=('site', 'first'),
                   count=('string1','count'))
              .reset_index())
print (df_topics)
  string1  theme type   tool   site  count
0  houses  white    A  phone    web      3
1  houses  black    B    NaN  cloud      1

与动态生成值相同:

d = {x: (x, 'first') for x in df.columns.difference(['string1','theme'])}
d['count'] = ('string1','count')


df_topics = (df.groupby(['string1','theme'], sort=False)
               .agg(**d)
               .reset_index())
print (df_topics)
  string1  theme   site   tool type  count
0  houses  white    web  phone    A      3
1  houses  black  cloud    NaN    B      1

编辑1:

g = df.groupby(['string1','theme'], sort=False)
df1 = g.size()
df_topics = g.first()

df_topics = pd.concat([df_topics, df1.rename("count")], axis=1, sort=False).reset_index() 
print (df_topics)
  string1  theme type   tool   site  count
0  houses  white    A  phone    web      3
1  houses  black    B    NaN  cloud      1
于 2020-04-19T11:02:38.233 回答