1

数据保存在.txt. 同一文本中保存了 200 个单词。如何将这些原材料输入 R 并进行二元物流回归each of these words

num 0 0.010752688172
num 0 0.003300330033

thanksgiving 0 0.0123456790123
thanksgiving 0 0.0016339869281
thanksgiving 0 0.00338983050847

off 0 0.00431034482759
off 0 0.00302114803625
off 1 0.001100110011
off 0 0.00377358490566
off 1 0.00166112956811
off 1 0.00281690140845
off 0 0.00564971751412
off 0 0.00112994350282
off 0 0.003300330033
off 0 0.0042735042735
off 1 0.00326797385621
off 0 0.00159489633174
off 0 0.00378787878788
4

2 回答 2

3

好吧,我很懒,所以:

allwords <- unique(dataframe[,1])
firstword <- dataframe[dataframe[,1]==allwords[1],]

等将按单词分解您的数据。但是您不需要创建firstword, , ... 因为使用其中一个函数来为每个值执行回归函数secondword同样容易applyallwords

于 2012-06-15T19:29:07.807 回答
1

这是我如何使用plyr包:

# Load the plyr library
library(plyr)

# Read in the data
allwords <- read.table("words.txt")

# Name the variables more meaningfully than this
names(allwords) <- c("word", "y", "x")

# dlply iterates over the data.frame, splitting by "word", 
# and running a glm with the arguments formula = y ~ x and family = binomial
# and returns a list of the resulting glm objects
models <- dlply(allwords,
                .var = "word",
                .fun = glm, formula = y ~ x, family = binomial)

# It's then easy to iterate over that list using lapply, llply, ldply, etc.
# (depending on what you want back out)
# Summarize:
llply(models, summary)

# Get all the coefficients
ldply(models, coef)

# Get AICs
# Not that you can compare these among word-models, but you get the idea.
ldply(models, AIC)

# Or, if you want to work with a particular model
models$num
于 2012-06-15T19:50:13.790 回答