r - 在 R 中使用 Fselector 进行卡方特征选择

Question

我是 R 的初学者，我有一个data frame包含二进制值的。在我的数据框中，前 6000 列是我要从中选择特征的属性，最后 10 列（同样是二进制）是我需要用来训练数据的类。我了解到我可以使用Fselector包计算每个属性的卡方值，然后对它们进行排序并选择我的特征。我从Fselector包中找到了这个例子：

# Use HouseVotes84 data from  mlbench package
library(mlbench)# For data
library(FSelector)#For method
data(HouseVotes84)


#Calculate the chi square statistics 
weights<- chi.squared(Class~., HouseVotes84)


# Print the results 
print(weights)


# Select top five variables
subset<- cutoff.k(weights, 5)


# Print the final formula that can be used in classification
f<- as.simple.formula(subset, "Class")
print(f)

Class但是当我为我的数据编写相同的代码时，R在 command 之后找不到对象weights<- chi.squared(Class~., HouseVotes84)。Fselector 包指出应该有一个公式，但我不知道什么样的公式。我应该在那里写卡方检验的数学公式吗？那么与使用 For 循环计算 X^2 统计数据相比，包的意义何在？

我不会使用其他软件包，quanteda因为我实际上想避免输入卡方的整个公式来进行特征选择。您对如何根据我的数据结构修复该行代码有任何建议吗？

更新：这是我数据的前三行，其中包含 6000 列术语中的 10 列。最后 10 列是我的课程。

   structure(list(rigid = c(0, 0, 0), sobaaox = c(0, 0, 0), intermittententsharpleft = c(0, 
0, 0), pnuemondayia = c(0, 0, 0), medport = c(0, 0, 0), assharp = c(0, 
0, 0), ambult = c(0, 0, 0), cmpliant = c(0, 0, 0), anlk = c(0, 
0, 0), scoliosi = c(0, 0, 0), espec = c(0, 0, 0), `290` = c(0L, 
0L, 0L), `320` = c(0L, 0L, 0L), `390` = c(1L, 0L, 0L), `460` = c(0L, 
0L, 0L), `520` = c(0L, 1L, 0L), `580` = c(0L, 0L, 0L), `710` = c(0L, 
0L, 0L), `780` = c(0L, 0L, 1L), `800` = c(0L, 0L, 0L), `100001` = c(0L, 
0L, 0L)), .Names = c("rigid", "sobaaox", "intermittententsharpleft", 
"pnuemondayia", "medport", "assharp", "ambult", "cmpliant", "anlk", 
"scoliosi", "espec", "290", "320", "390", "460", "520", "580", 
"710", "780", "800", "100001"), row.names = c(NA, 3L), class = "data.frame")

r - 在 R 中使用 Fselector 进行卡方特征选择

0 回答 0

Related

Reference