python - Is there a way of performing arithmetic operations on entire Frame in Python datatable?

Question

This question is about the recent h2o datatable package. I want to replace pandas code with this library to enhance performance.

The question is simple: I need to divide/sum/multiply/substract an entire Frame or various selected columns by a number.

In pandas, to divide all the columns excluding the first by 3, one could write:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    "C0": np.random.randn(10000), 
    "C1": np.random.randn(10000)
})
df.iloc[:,1:] = df.iloc[:,1:]/3

In the datatable package, one can do this just for one selected column:

import datatable as dt
from datatable import f

df = dt.Frame(np.random.randn(1000000))
df[:, "C1"] = dt.Frame(np.random.randn(1000000))
for i in range(1,df.shape[1]): df[:,i] = df[:,f[i]/3]

By now, in Python 3.6 (I don't know about the 3.7 version), the FrameProxy f doesn't admit slices. I'm just asking if there's a better way to perform this kind of Frame arithmetic operations than a loop, I haven't found it on the Documentation.

EDIT:

Latest commit #1962 has added a feature related to this question. If I'm able to run the latest source version, I'll add myself an answer including that new feature.

score 4 · Accepted Answer

您是正确的，f-symbol 当前不支持切片表达式（顺便说一句，这是一个有趣的想法，也许将来可以添加？）

但是，赋值的右侧可以是表达式列表，允许您编写以下内容：

df = dt.Frame(C0=np.random.randn(1000000),
              C1=np.random.randn(1000000))

df[:, 1:] = [f[i]/3 for i in range(1, df.ncols)]

score 3 · Accepted Answer

截至 2019 年 1 月，Python 3.6 和 3.7 版本均通过带有f 表达式的支持切片datatable安装，并且已记录在. 因此，解决方案很简单。pip

import datatable as dt
from datatable import f
import numpy as np

# generate some data to test
df = dt.Frame(C0=np.random.randn(1000000),
              C1=np.random.randn(1000000))

df[:, 1:] = df[:, f[1:]/3]

python - Is there a way of performing arithmetic operations on entire Frame in Python datatable?

2 回答 2

Related

Reference