This question is about the recent h2o datatable package. I want to replace pandas code with this library to enhance performance.
The question is simple: I need to divide/sum/multiply/substract an entire Frame or various selected columns by a number.
In pandas, to divide all the columns excluding the first by 3, one could write:
import pandas as pd
import numpy as np
df = pd.DataFrame({
"C0": np.random.randn(10000),
"C1": np.random.randn(10000)
})
df.iloc[:,1:] = df.iloc[:,1:]/3
In the datatable package, one can do this just for one selected column:
import datatable as dt
from datatable import f
df = dt.Frame(np.random.randn(1000000))
df[:, "C1"] = dt.Frame(np.random.randn(1000000))
for i in range(1,df.shape[1]): df[:,i] = df[:,f[i]/3]
By now, in Python 3.6 (I don't know about the 3.7 version), the FrameProxy f doesn't admit slices. I'm just asking if there's a better way to perform this kind of Frame arithmetic operations than a loop, I haven't found it on the Documentation.
EDIT:
Latest commit #1962 has added a feature related to this question. If I'm able to run the latest source version, I'll add myself an answer including that new feature.