我想根据列值过滤 cuDF 数据框,然后根据指定的条件创建一个新列。基本上,我如何在 cuDF 中应用以下内容?
df.loc[df.column_name condition, 'new column name'] = 'value if condition is met'
你也可以使用.query()
例子:
expr = "(a == 2) or (b == 3)"
filtered_df = df.query(expr)
wherea和b是数据框中列的名称。
虽然masked_assign适用于某些条件,但applymap在语法上更好,功能上与 Pandas API 相似。
此外,@ashwin-srinath 提到__setitem()__即将发布 0.9 版本,因此您只需df[condition] = value. masked_assign可能会消失,就像__setitem()__不是masked_assignPandas API 函数一样。
# value to be replaced in series
value = 'value if condition is met'
# condition to qualify for replacement
mask = df.column_name condition
# https://docs.rapids.ai/api/cudf/stable/
df['new column name'] = df.masked_assign(value, mask)
"""explanation:
>> if there is no pool, pool_sqft should be 0
"""
# value to be replaced in series
value = 0
# condition to qualify for replacement
mask = df_train['pool_count']==0
# https://docs.rapids.ai/api/cudf/stable/
df['pool_sqft'] = df.masked_assign(value, mask)