我想根据列值过滤 cuDF 数据框,然后根据指定的条件创建一个新列。基本上,我如何在 cuDF 中应用以下内容?
df.loc[df.column_name condition, 'new column name'] = 'value if condition is met'
你也可以使用.query()
例子:
expr = "(a == 2) or (b == 3)"
filtered_df = df.query(expr)
wherea
和b
是数据框中列的名称。
虽然masked_assign
适用于某些条件,但applymap
在语法上更好,功能上与 Pandas API 相似。
此外,@ashwin-srinath 提到__setitem()__
即将发布 0.9 版本,因此您只需df[condition] = value
. masked_assign
可能会消失,就像__setitem()__
不是masked_assign
Pandas API 函数一样。
# value to be replaced in series
value = 'value if condition is met'
# condition to qualify for replacement
mask = df.column_name condition
# https://docs.rapids.ai/api/cudf/stable/
df['new column name'] = df.masked_assign(value, mask)
"""explanation:
>> if there is no pool, pool_sqft should be 0
"""
# value to be replaced in series
value = 0
# condition to qualify for replacement
mask = df_train['pool_count']==0
# https://docs.rapids.ai/api/cudf/stable/
df['pool_sqft'] = df.masked_assign(value, mask)