0

我正在尝试使用 R 和 SQL Server 2016 进行流失分析。我已将数据集上传到本地 SQL Server 的数据库中,并对此数据集进行了所有初步工作。好吧,现在我有了这个函数trainModel(),我可以用它来估计我的随机模型森林:

trainModel = function(sqlSettings, trainTable) {
sqlConnString = sqlSettings$connString

trainDataSQL <- RxSqlServerData(connectionString = sqlConnString,
                                table = trainTable,
                                colInfo = cdrColInfo)

## Create training formula
labelVar = "churn"
trainVars <- rxGetVarNames(trainDataSQL)
trainVars <- trainVars[!trainVars %in% c(labelVar)]
temp <- paste(c(labelVar, paste(trainVars, collapse = "+")), collapse = "~")
formula <- as.formula(temp)

## Train gradient tree boosting with mxFastTree on SQL data source
library(RevoScaleR)
rx_forest_model <- rxDForest(formula = formula,
                             data = trainDataSQL,
                             nTree = 8,
                             maxDepth = 16,
                             mTry = 2,
                             minBucket = 1,
                             replace = TRUE,
                             importance = TRUE,
                             seed = 8,
                             parms = list(loss = c(0, 4, 1, 0)))

return(rx_forest_model)
}

但是当我运行这个函数时,我得到了这个错误的输出:

> system.time({
+   trainModel(sqlSettings, trainTable)
+ })
   user  system elapsed 
   0.29    0.07   58.18 
Warning message:
In tempGetNumObs(numObs) :
  Number of observations not available for this data source. 'numObs' set to 1e6.

对于此警告消息,该函数trainModel()不会创建对象rx_forest_model

有没有人对如何解决这个问题有任何建议?

4

1 回答 1

0

经过几次尝试,我找到了该功能trainModel()无法正常运行的原因。不是连接字符串问题,甚至不是数据源类型问题。问题在于 function 的语法trainModel()

从函数体中删除以下语句就足够了:

return(rx_forest_model)

这样,该函数返回相同的警告消息,但rx_forest_model以正确的方式创建对象。

所以,正确的函数是:

trainModel = function(sqlSettings, trainTable) {
sqlConnString = sqlSettings$connString

trainDataSQL <- RxSqlServerData(connectionString = sqlConnString,
                            table = trainTable,
                            colInfo = cdrColInfo)

## Create training formula
labelVar = "churn"
trainVars <- rxGetVarNames(trainDataSQL)
trainVars <- trainVars[!trainVars %in% c(labelVar)]
temp <- paste(c(labelVar, paste(trainVars, collapse = "+")), collapse = "~")
formula <- as.formula(temp)

## Train gradient tree boosting with mxFastTree on SQL data source
library(RevoScaleR)
rx_forest_model <- rxDForest(formula = formula,
                             data = trainDataSQL,
                             nTree = 8,
                             maxDepth = 16,
                             mTry = 2,
                             minBucket = 1,
                             replace = TRUE,
                             importance = TRUE,
                             seed = 8,
                             parms = list(loss = c(0, 4, 1, 0)))

}
于 2017-06-08T15:51:57.460 回答