1

我无法让插入符号 rfe 工作。从已知的开始,http ://machinelearningmastery.com/feature-selection-with-the-caret-r-package/ 中的示例完美无缺。

但是,当我替换自己的数据集时,它失败了:

> results <- rfe(x, y, sizes=c(1:5), rfeControl=control)
Error in rfe.default(x, y, sizes = c(1:5), rfeControl = control) : 
  there should be the same number of samples in x and y

据我所知,x 和 y 中的样本行数是相同的;

> nrow(x)
[1] 691231
> nrow(y)
[1] 691231

详情见下文。

我看过类似的问题,例如R rfe function "caret" Package error: should have the same number of samples in x and y and R trying to get caret / rfe to work。后者是相关的,但似乎没有帮助。我试过将我的 y 转换为矢量

> y <- as.vector(y)

或者

> y <- as.vector(as.list(y))

但错误仍然存​​在。当然我做了一些愚蠢的事情,我只是看不到我错在哪里。任何帮助表示赞赏。

:-)

亚克

- - - - - - - - - - - 细节 - - - - - - -

- - - 脚本 - - - -

library(feather)
library(mlbench)
library(caret)

path <- "faultclass.feather"
df <- read_feather(path)

set.seed(7)
control <- rfeControl(functions=rfFuncs, method="cv", number=10)
x <- subset(df,select=-c(fault))
y <- df["fault"]*1
results <- rfe(x, y, sizes=c(1:5), rfeControl=control)
print(results)
predictors(results)
plot(results, type=c("g", "o"))

- - - 特征 - - -

> str(x)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   691231 obs. of  31 variables:
 $ A               : chr  "2011-12-06 00:00:00" "2011-03-11 00:00:00" "2014-11-17 00:00:00" "2013-01-07 15:19:02" ...
 $ B               : num  6 6 11 11 6 6 6 6 6 6 ...
 $ C               : num  NA NA NA NA NA NA NA NA NA NA ...
 $ D               : chr  "2016-01-01 00:00:00" "2016-01-01 00:00:00" "2016-01-01 00:00:00" "2016-01-01 00:00:00" ...
 $ E               : chr  NA NA NA NA ...
 $ F               : num  0 230 230 230 230 230 230 230 230 0 ...
 $ G               : num  13 35 38 128 12 6 10 4 2 6 ...
 $ H               : chr  NA NA NA NA ...
 $ J               : chr  "35" "35" "28" "34" ...
 $ K               : num  0 63 32 63 40 40 35 40 35 25 ...
 $ L               : num  3 3 3 3 3 3 3 2 2 2 ...
 $ M               : num  301 301 301 301 301 301 301 301 301 301 ...
 $ N               : chr  "613.0" "9630.0" "9114.0" "600.0" ...
 $ O               : chr  "000356039" "000664676" "000770082" "000617804" ...
 $ P               : chr  "11610000" "0000003001" "1161000" "43850" ...
 $ Q               : num  10089 10089 10972 27629 27630 ...
 $ R               : num  7.07e+17 7.07e+17 7.07e+17 7.07e+17 7.07e+17 ...
 $ S               : num  1 1 1 1 1 1 1 1 1 1 ...
 $ T               : chr  "XX" "XX" "809" "96" ...
 $ U               : chr  "cac" "edr" "ssr" "nsk" ...
 $ V               : chr  "1954-05-17 00:00:00" "1973-05-17 00:00:00" "1997-06-24 00:00:00" "1976-12-24 00:00:00" ...
 $ W               : num  287 287 287 665 664 664 664 664 664 664 ...
 $ X               : num  1 1 1 1 1 1 1 1 1 1 ...
 $ Y               : num  NA NA NA NA NA NA NA NA NA NA ...
 $ Z               : num  24828 39591 8932 35162 28540 ...
 $ AA              : chr  "0001" "0001" "0001" "0002" ...
 $ AB              : chr  "0001-TRA" "0001-TRB" "0001-TRC" "0002-TRD" ...
 $ AC              : chr  "0,230" "0,230" "0,230" "0,230" ...
 $ AD              : chr  "K03" "K03" "K03" "K05" ...
 $ AE              : num  3 3 3 3 3 3 3 3 3 3 ...
 $ AF              : chr  "IT" "IT" "IT" "IT" ...

> str(y)
'data.frame':   691231 obs. of  1 variable:
 $ fault: num  0 0 0 0 0 0 0 0 0 0 ...
4

1 回答 1

0

我遇到了同样的问题,最终我通过使用它来工作

y = as.vector(unlist(c(y)))

我不确定为什么这与

y = as.vector(y)

但这对我有用。

于 2017-06-15T14:57:55.897 回答