0

我正在研究将遗传算法应用于二元逻辑回归。我有几个问题需要澄清。你能帮我么?

  1. 我可以使用 AIC 或 BIC 作为 GA 中的适应度函数吗?(我使用它们,结果表明 GA 比传统的二元逻辑模型更准确。但是,我发现在大多数论文中,他们使用 AUC 作为适应度函数)

  2. 根据本文(http://atm.amegroups.com/article/view/18292/html),我尝试使用 AUC 作为适应度函数的 GA,它给出了以下错误。你能创建一个可重现的小例子来克服这个问题吗?

model.frame.default 中的错误(formula = as.numeric(tey) ~ predict.glm(trm, : 可变长度不同(找到 'predict.glm(trm, newdata = ted, type = "response")')

4

5 回答 5

0

在 galgo 包中,可以自定义成本函数。你能按照论文中的描述运行程序吗?例如,您可以将 AUC 定义为您的目标;并且您使用神经网络进行预测,以下代码可以提供帮助:

reg.fitness <- function(chr, parent,tr,te,res) {
  try <- as.factor(parent$data$classes[tr])
  trd <-
    data.frame(parent$data$data[tr,as.numeric(chr)])
  trm <- nnet::nnet(try ~ ., data = cbind(trd,try=try),trace=F,
                    size = 5)
  tey <- as.factor(parent$data$classes[te])
  ted <-
    data.frame(parent$data$data[te,as.numeric(chr)])
  pred

 <- predict(trm,newdata = cbind(ted,tey=tey),type = "raw")
  if(res){
    roc(tey,pred,
        levels=levels(tey),
        direction = "<")$auc
  }
  else{
    predict(trm,newdata=cbind(ted,tey=tey),type="class")
  }

}

您可以通过修改此块来调整模型: trm <- nnet::nnet(try ~ ., data = cbind(trd,try=try),trace=F, size = 5)

于 2020-04-08T09:04:27.670 回答
0

问题是您使用管理员来预测管理员,这当然会得到 1 的 AUC。这是可以在我的计算机上预期运行的修改后的代码。

库(pROC) 库(galgo)
库(rtkore) 库(Rcpp) 库(aod) mydata <- read.csv(“ https://stats.idre.ucla.edu/stat/data/binary.csv ”)

reg.fitness = function(chr, parent,tr,te,res) {
try=as.factor(parent$data$classes[tr]) trd = data.frame(parent$data$data[tr,as.numeric( chr)]) trm = nnet::nnet(try ~ ., data = cbind(trd,try=try),trace=F,size = 2) tey = as.factor(parent$data$classes[te]) ted = data.frame(parent$data$data[te,as.numeric(chr)]) pred=predict(trm,newdata = cbind(ted,tey=tey),type = "raw") if(res){ roc (tey,pred,levels=levels(tey), direction = "<")$auc } else{ predict(trm,newdata=cbind(ted,tey=tey),type="class") } }

reg.bb = configBB.VarSel(data=t(mydata[,-1]),classes=mydata$admit,classification.method="user",classification.userFitnessFunc=reg.fitness,chromiumSize=2,niches=1, maxSolutions=10,goalFitness = 0.9,saveVariable="reg.bb",saveFrequency=50,saveFile="reg.bb.Rdata",main="Logistic")

爆炸(reg.bb)

情节(爆炸(reg.bb))

于 2020-04-18T23:28:14.683 回答
0

@Z. Zhang,这是我的代码的可重现示例。

library(pROC)
library(galgo) 
library(rtkore)
library(Rcpp)
library(aod)
mydata <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
attach(mydata)


reg.fitness = function(chr, parent,tr,te,res) {
  try=as.factor(parent$data$classes[tr])
  trd =
    data.frame(parent$data$data[tr,as.numeric(chr)])
  trm = nnet::nnet(try ~ ., data = cbind(trd,try=try),trace=F,size = 2)
  tey = as.factor(parent$data$classes[te])
  ted =
    data.frame(parent$data$data[te,as.numeric(chr)])
 pred=predict(trm,newdata = cbind(ted,tey=tey),type = "raw")
  if(res){
    roc(tey,pred,levels=levels(tey),
        direction = "<")$auc
  }
  else{
    predict(trm,newdata=cbind(ted,tey=tey),type="class")
  }

}




reg.bb = configBB.VarSel(data=t(mydata[,-ncol(mydata)]), 
                          classes=admit ,
                          classification.method="user", 
                          classification.userFitnessFunc=reg.fitness, 
                          chromosomeSize=2 ,niches=1, maxSolutions=10,
                          goalFitness = 0.9, saveVariable="reg.bb",
                          saveFrequency=50, saveFile="reg.bb.Rdata", 
                          main="Logistic")
blast(reg.bb)

Plot(blast(reg.bb))
于 2020-04-13T05:59:26.580 回答
0

对于示例中的 ANN,我使用了 nnet 函数,它只有一个隐藏层,输入和输出由数据维度决定。您可以参考 nnet 函数的手册帮助。此外,您可以通过将函数替换为任何其他函数来应用任何结构的 ANN

于 2020-06-28T22:04:37.783 回答
0

@Z。张

reg.fitness = function(chr, parent,tr,te,res) {
  try=as.factor(parent$data$classes[tr])
  trd =
    data.frame(parent$data$data[tr,as.numeric(chr)])
  trm = nnet::nnet(try ~ ., data = cbind(trd,try=try),trace=F,
                    size = 3)
  tey = as.factor(parent$data$classes[te])
  ted =
    data.frame(parent$data$data[te,as.numeric(chr)])
 pred=predict(trm,newdata = cbind(ted,tey=tey),type = "raw")
  if(res){
    roc(tey,pred,levels=levels(tey),
        direction = "<")$auc
  }
  else{
    predict(trm,newdata=cbind(ted,tey=tey),type="class")
  }

}




reg.bb = configBB.VarSel(data=t(data_set[,-ncol(data_set)]), 
                          classes=data_set$y, 
                          classification.method="user", 
                          classification.userFitnessFunc=reg.fitness, 
                          chromosomeSize=3, niches=1, maxSolutions=10,
                          goalFitness = 0.9, saveVariable="reg.bb",
                          saveFrequency=50, saveFile="reg.bb.Rdata", 
                          main="Logistic")
blast(reg.bb)

这是我前 4 次迭代得到的输出

[Bb] Starting, Solutions=10
[Bb]    #bb Sol Last    Fitness %Fit    Gen Time    Elapsed Total   Remaining

[e] Starting: Fitness Goal=0.9, Generations=(10 : 200)
[e] Elapsed Time    Generation  Fitness %Fit    [Next Generations]
[e] 0h 0m 0s    (m) 0   1   111.11% +GGGGGGGGGG
[e] 0h 0m 9s    *** 11  1   111.11% FINISH: 1 2 1 

[Bb]    1   1   Sol Ok  1   111.11% 11  9.33s   9s  10s 42s (0h 0m 42s )

[e] Starting: Fitness Goal=0.9, Generations=(10 : 200)
[e] Elapsed Time    Generation  Fitness %Fit    [Next Generations]
[e] 0h 0m 0s    (m) 0   1   111.11% +GGGGGGGGGG
[e] 0h 0m 10s   *** 11  1   111.11% FINISH: 1 1 3 

[Bb]    2   2   Sol Ok  1   111.11% 11  10.35s  20s 22s 50s (0h 0m 50s )

[e] Starting: Fitness Goal=0.9, Generations=(10 : 200)
[e] Elapsed Time    Generation  Fitness %Fit    [Next Generations]
[e] 0h 0m 0s    (m) 0   1   111.11% +GGGGGGGGGG
[e] 0h 0m 10s   *** 11  1   111.11% FINISH: 3 1 1 

[Bb]    3   3   Sol Ok  1   111.11% 11  9.93s   30s 34s 50s (0h 0m 50s )

[e] Starting: Fitness Goal=0.9, Generations=(10 : 200)
[e] Elapsed Time    Generation  Fitness %Fit    [Next Generations]
[e] 0h 0m 0s    (m) 0   1   111.11% +GGGGGGGGGG
[e] 0h 0m 10s   *** 11  1   111.11% FINISH: 1 2 2 

[Bb]    4   4   Sol Ok  1   111.11% 11  10s 40s 45s 45s (0h 0m 45)

绘制所有 1000 次迭代,其中它给出的适应度函数值与 1 相同

于 2020-04-10T06:49:59.187 回答