r - 当 model.matrix 行数与神经网络中的测试数据帧不同时进行预测

Question

我最近询问了以下有关使用库requires numeric/complex matrix/vector arguments时出现的错误的问题。neuralnet这是我最初的问题：“第一次在 R 中使用神经网络：获取“需要数字/复杂矩阵/向量参数”，但不知道如何纠正“。

解决方案是使用该model.matrix函数将我的数据框中的因子转换为“虚拟”变量。结果代码如下：

matrix.train <- model.matrix( 
  ~ survived + pclass + sex + age + sibsp + parch + fare + embarked, 
  data = train
)

因为我的源数据框充满了许多单独的NA值，所以生成的矩阵最终有 714 行，而不是原始数据框的 891 行。

这对我的训练数据来说没问题。但是，当我加载我的测试数据框并将其转换为矩阵时，我遇到了同样的问题。这次我得到了 331 个矩阵行，而我的源数据框中有 418 个行。

在我compute将模型应用于我的测试数据之后，我无法将cbind我的预测返回到我的测试数据，因为行数不同。所以，我的问题是：

有没有办法强制model.matrix输出与源数据框相同的行数，忽略大小写NA？我的模型需要能够处理NA并仍然输出预测，因为遇到至少有一个的行NA很常见。或者，告诉神经网络将NA值视为有效因素会更好吗？

这是到目前为止我一直在尝试使用的代码：

#Build a matrix from training data (714 rows vs 891 rows due to NAs in data) 
matrix.train <- model.matrix(
  ~ survived + pclass + sex + age + sibsp + parch + fare + embarked, 
  data=train
)

library(neuralnet)

#Train the neural net
net <- neuralnet(
  survived ~ pclass + sexmale + age + sibsp + parch + fare + embarkedC + 
  embarkedQ + embarkedS, data=matrix.train, hidden=10, threshold=0.01
)

#Build a matrix from test data (331 rows vs 418 rows due to NAs in data)
matrix.test <- model.matrix(~ pclass + sex + age + sibsp + parch + fare + embarked, 
  data=test
)

#Apply neural net to test matrix 
net.results <- compute(
  net, matrix.test
)

#Attempt to map results back to original test data
cleanoutput <- cbind(
  net.results$net.result,test
)

Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 331, 418

当我尝试使用rownames来自火车数据框的强制 matrix.model 矩阵进入相同的行数时，我得到以下信息：

matrix.train <- matrix.train[match(rownames(train),rownames(matrix.train)),]

> matrix.train
    (Intercept) survived pclass sexmale   age sibsp parch     fare embarkedC embarkedQ embarkedS
1             1        0      3       1 22.00     1     0   7.2500         0         0         1
2             1        1      1       0 38.00     1     0  71.2833         1         0         0
3             1        1      3       0 26.00     0     0   7.9250         0         0         1
4             1        1      1       0 35.00     1     0  53.1000         0         0         1
5             1        0      3       1 35.00     0     0   8.0500         0         0         1
6            NA       NA     NA      NA    NA    NA    NA       NA        NA        NA        NA
7             1        0      1       1 54.00     0     0  51.8625         0         0         1

但是，那一行 NA 是不准确的。事实上，该行中可能只有一个 NA 值，但出于某种原因，每当行中列出一个 NA 值时，矩阵会将整行转换为 NA。而不是上面的，这是我想看到的：

> matrix.train
    (Intercept) survived pclass sexmale   age sibsp parch     fare embarkedC embarkedQ embarkedS
1             1        0      3       1 22.00     1     0   7.2500         0         0           1
2             1        1      1       0 38.00     1     0  71.2833         1         0         0
3             1        1      3       0 26.00     0     0   7.9250         0         0         1
4             1        1      1       0 35.00     1     0  53.1000         0         0         1
5             1        0      3       1 35.00     0     0   8.0500         0         0         1
6             1        0      3       1 NA        0     0   6.25           1         0        NA
7             1        0      1       1 54.00     0     0  51.8625         0         0         1

r - 当 model.matrix 行数与神经网络中的测试数据帧不同时进行预测

0 回答 0

Related

Reference