r - 分类/决策树和选择拆分

Question

这是一个非常基本的例子。但是我正在做一些数据分析，并且不断发现自己正在编写非常相似的 SQL 计数查询来生成概率表。

我的表被定义为值 0 表示事件没有发生，而值 1 表示事件确实发生了。

  > sqldf("select count(distinct Date) from joinedData where C_O_Above_prevHigh = 0 and  C_O_Below_prevLow = 0")
  count(distinct Date)
1                 1081

> sqldf("select count(distinct Date) from joinedData where C_O_Above_prevHigh = 0 and C_O_Below_prevLow = 0 and E_halfGap = 1")
  count(distinct Date)
1                  956

> sqldf("select count(distinct Date) from joinedData where C_O_Above_prevHigh = 1 OR C_O_Below_prevLow = 1 and E_halfGap = 1")
  count(distinct Date)
1                  504

在上面的例子中，我的预测变量是C_O_Above_prevHigh，C_O_Below_prevLow我的结果变量是E_halfGap。有几种情况可能会有更多的预测变量，例如Time

而不是执行上述操作并手动输入具有不同排列的所有查询，R 或其他一些应用程序中是否有任何可用的东西：

1）根据我的预测器输出潜在的概率路径？2）允许我选择如何分割路径

感谢您的意见。

score 2 · Accepted Answer

如果您想要所有总计和小计，您可以CUBE BY在 SQL（但它不在 SQLite 中）或addmarginsR 中使用。

addmargins( Titanic )
# More readable:
ftable( addmargins( Titanic ) )

如果要构建决策树，可以使用rpart包或检查机器学习或图形模型任务视图

r - 分类/决策树和选择拆分

1 回答 1

Related

Reference