不同的分解方法可以看作是统计机器中可以互换的齿轮,对你这个创造者有用。
要选择最好的齿轮,您评估的指标不一定与齿轮有关,而是机器在分别插入每个齿轮时的整体性能如何。
忽略齿轮规格:
您有几个齿轮,它们都带有自己的工厂验证规格(包装)。这些数字/摘要/规格可能不是您想要的。可能的齿轮不会提供相同的指标,因此很难进行公平的比较。此外,这些指标将全部与齿轮有关,而不是与您的特定机器有关。不要按照博客的建议去做,将机器指标与pca.recon()
. 让齿轮成为齿轮,并将度量评估延迟到机器级别。
齿轮是否适合?:您需要检查您的特定机器,所有候选齿轮实际上都适合内部。您的合成/重建机器的齿轮必须能够双向转动。t-sne 只是设计用来转正做分解的,所以不可能做有意义的评估。对于 UMAP 也是如此。也许整个重建损失基准测试并不是您一开始想要使用的实际机器。也许只是为另一台机器挑选齿轮的一个副项目……如果你的机器要绘制漂亮的图,那么很难获得好的定量基准。如果您的机器是与简单分类器混合的一些初始分解,那么 t-sne 齿轮将非常适合,并且一些预测准确度指标可能对选择具有 .
连接各种齿轮:由于尺寸和形状不一样,齿轮实际上不会开箱即用地安装到您的机器中。每个齿轮都需要单独调整。您可能很想将机器重新安装到齿轮上,这对几个齿轮就可以了。那就是直接复制粘贴您的机器代码,插入和调整每个齿轮。一种更具可扩展性的方法是只连接齿轮,这样您就可以将它们放在机器旁边的袋子里,让机器人同时插入一个齿轮并给您写一份报告。这是 sklearn、caret 和 keras 等框架的主要卖点。你也可以自己编码。这是一个简单的例子:
rm(list=ls())
#some data
X <- iris[,c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width")]
#my_gear, prcomp wrapped in an interface
#any gear must have the gear(X, N, ...) signature
pca_decompose <- function(X, N=2, ...) {
#implement gear forward (decompose)
pca <- prcomp(
X, rank. = N,
scale = FALSE #must be false, beacuse reconstructor below does not support re-scaling, because I'm lazy.
)
#implement gear backward (reconstruct)
reconstruct <- function(Xnew = pca$x) {
# a pca reconstructor implementation similar to function from the blog, pca already in closure
# I think the blog mistankenly referred to pca$x instead of x sometimes
pca.recon <- function(x, k){
x_recon <- x[,1:k] %*% t(pca$rotation[,1:k])
#slightly more effecient way to reapply center
for(i in seq_along(pca$center)) x_recon[,i] <- x_recon[,i] + pca$center[i]
return(x_recon)
}
X_rc <- pca.recon(Xnew, k=N)
return(X_rc)
}
#wrap up the interface
self <- list(
X_decomposed = pca$x, # any decomposition must be named X_dc
reconstruct = reconstruct
)
class(self) <- c("my_pca","my_universal_gear")
return(self)
}
#define a machine with the relevant use case
my_machine <- function(gear, data, ...) {
dc_obj <- gear(data, ...)
data_rc <- dc_obj$reconstruct(dc_obj$X_decomposed)
}
#define the most useful metric
my_metric <- function(X,Y) {
# this 'multivariate' mse, is not commonly used I think.
# but whatever floats the boat
mean((X-Y)^2)
}
#define how to evaluate.
#try the gear in the mahine and meassure outcome with metric
my_evaluation <- function(gear, machine, data, metric, ...) {
data <- as.matrix(data)
output <- machine(gear,data, ...)
my_metric(data,output)
}
#useful syntactic sugar
set_params <- function(gear, ...) {
params = list(...)
function(...) do.call(gear,c(list(...),params))
}
#evaluate a gear
my_evaluation(
gear = pca_decompose,
machine = my_machine,
data = X,
#gear params
N=2
)
#the same as
my_evaluation(
gear = set_params(pca_decompose,N=2), #nice to preset gear params
machine = my_machine,
data = X
)
#define all gears to evaluate
#the gearbag could also in another usecase be a grid search of optimal hyper-parameters
my_gearbag = list(
pca_dc_N1 = set_params(pca_decompose,N=1),
pca_dc_N2 = set_params(pca_decompose,N=2),
pca_dc_N3 = set_params(pca_decompose,N=3),
pca_dc_N4 = set_params(pca_decompose,N=4)
#put also autoencoder or what ever in the gearbag
)
my_robot <- function(evaluation, machine, gearbag, data) {
results <- sapply(
X = gearbag, #this X is not the data put placeholder for what to iterate
FUN = evaluation,
machine = machine,
data = X
)
report = list(
README = "metric results for gears",
results = results
)
}
my_report <- my_robot(my_evaluation, my_machine, my_gearbag, data)
print(my_report)
打印出
$README
[1] "metric results for gears"
$results
pca_dc_N1 pca_dc_N2 pca_dc_N3 pca_dc_N4
8.560431e-02 2.534107e-02 5.919048e-03 1.692109e-31