python - dask assign() 或 apply() 中的变量列名

Question

pandas我有可以在dask. 这里有一个部分解决方案，但它不允许我使用变量作为我正在创建/分配的列的名称。

这是工作pandas代码：

percent_cols = ['num_unique_words', 'num_words_over_6']

def find_fraction(row, col):
    return row[col] / row['num_words']

for c in percent_cols:
    df[c] = df.apply(find_fraction, col=c, axis=1)

这是dask不符合我要求的代码：

data = dd.from_pandas(df, npartitions=8)

for c in percent_cols:
    data = data.assign(c = data[c] / data.num_words)

这会将结果分配给一个名为的新列，c而不是修改data[c]（我想要的）的值。如果我可以将列名作为变量，那么创建一个新列会很好。例如，如果这有效：

for c in percent_cols:
    name = c + "new"
    data = data.assign(name = data[c] / data.num_words)

出于显而易见的原因，python 不允许在 an 左侧使用表达式，=并忽略name.

如何使用变量作为我要分配的列的名称？循环的迭代次数远远超过我愿意复制/粘贴的次数。

score 2 · Accepted Answer

这可以解释为 Python 语言问题：

问题：如何在关键字参数中使用变量的值作为名称？

答：使用字典**和解包

c = 'name'
f(c=5)       # 'c' is used as the keyword argument name, not what we want
f(**{c: 5})  # 'name' is used as the keyword argument name, this is great

Dask.dataframe 解决方案

对于您的特定问题，我建议以下内容：

d = {col: df[col] / df['num_words'] for col in percent_cols}
df = df.assign(**d)

考虑用 Pandas 来做这件事

该.assign方法在 Pandas 中也可用，并且可能比使用.apply.

python - dask assign() 或 apply() 中的变量列名

1 回答 1

Dask.dataframe 解决方案

考虑用 Pandas 来做这件事

Related

Reference