我遇到了一个类似的问题,并使用 Base R 以这种方式实现它。我倾向于通过不使用 R6 之类的第三方软件包来让自己变得更难。为了解决这个问题,我访问了定义对象方法的环境并以这种方式存储变量。
在这个例子中,我试图实现一个 MinMaxScaler,就像在 scikit learn 中找到的那样:
## Base reference class
setRefClass(
"Transformer",
contains = "VIRTUAL",
methods = list(
fit = function(data) stop("Must implement"),
transform = function(data) stop("Must implement"),
fit_transform = function(data) {
fit(data)
transform(data)
}
))
Transformer API 的具体实现。在fit
方法中,我访问了fit
定义的环境。然后,我使用该环境来存储中间计算所需的任何变量并就地更新对象——就像 sklearn 一样。
MinMaxScaler <-setRefClass(
"MinMaxScaler",
contains = "Transformer",
fields = c(feature_range = "numeric"),
methods = list(
fit = function(data) {
env <- environment(fun = .self$fit)
rng <- range(data, na.rm=TRUE)
env$data_range_ <- diff(range(data, na.rm=TRUE))
env$data_min_ <- rng[[1]]
env$data_max_ <- rng[[2]]
},
transform = function(data) {
env <- environment(fun = .self$transform)
scalef <- diff(range(feature_range))
scalef * (data - env$data_min_) / env$data_range_ + min(feature_range)
}
)
)
为了演示这种模式,我将创建两个缩放器并分别拟合它们:
> ## Dummy data
> set.seed(123)
> z <- rnorm(1e4)
> summary(z)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-3.845320 -0.667969 -0.011089 -0.002372 0.673347 3.847768
>
> scaler1 <- MinMaxScaler(feature_range=c(0, 50))
> summary(scaler1$fit_transform(z))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 20.65 24.92 24.98 29.37 50.00
>
> scaler2 <- MinMaxScaler(feature_range=c(-100, 100))
> summary(scaler2$fit_transform(z))
Min. 1st Qu. Median Mean 3rd Qu. Max.
-100.00000 -17.39725 -0.32011 -0.09347 17.47344 100.00000
>
> ## to show the scalers are distinct and not sharing private vars
> summary(scaler1$transform(z))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 20.65 24.92 24.98 29.37 50.00
> summary(scaler2$transform(z))
Min. 1st Qu. Median Mean 3rd Qu. Max.
-100.00000 -17.39725 -0.32011 -0.09347 17.47344 100.00000