1

我想根据列值过滤 cuDF 数据框,然后根据指定的条件创建一个新列。基本上,我如何在 cuDF 中应用以下内容?

df.loc[df.column_name condition, 'new column name'] = 'value if condition is met'

4

3 回答 3

0

你也可以使用.query()

例子:

expr = "(a == 2) or (b == 3)"
filtered_df = df.query(expr)

whereab是数据框中列的名称。

于 2019-08-02T17:06:13.173 回答
0

虽然masked_assign适用于某些条件,但applymap语法上更好,功能上与 Pandas API 相似

此外,@ashwin-srinath 提到__setitem()__即将发布 0.9 版本,因此您只需df[condition] = value. masked_assign可能会消失,就像__setitem()__不是masked_assignPandas API 函数一样。

于 2019-07-29T16:04:08.877 回答
0

给定 cuDF 中的 Pandas

# value to be replaced in series 
value = 'value if condition is met'
# condition to qualify for replacement
mask = df.column_name condition

# https://docs.rapids.ai/api/cudf/stable/
df['new column name'] = df.masked_assign(value, mask)

应用实例

"""explanation: 
  >> if there is no pool, pool_sqft should be 0
"""

# value to be replaced in series 
value = 0
# condition to qualify for replacement
mask = df_train['pool_count']==0

# https://docs.rapids.ai/api/cudf/stable/
df['pool_sqft'] = df.masked_assign(value, mask)
于 2019-07-27T02:37:17.013 回答