2

我希望用加权数据按组计算两种频率表。

您可以使用以下代码生成可重现的数据:

Data <- data.frame(
     country = sample(c("France", "USA", "UK"), 100, replace = TRUE),
     migrant = sample(c("Native", "Foreign-born"), 100, replace = TRUE),
     gender = sample (c("men", "women"), 100, replace = TRUE),
     wgt = sample(100),
     year = sample(2006:2007)
     )

首先,我尝试按国家和年份计算移民身份(本地 VS 外国出生)的频率表。questionr我使用包和编写了以下代码plyr

db2006 <- subset (Data, year == 2006)
db2007 <- subset (Data, year == 2007)

result2006 <- as.data.frame(cprop(wtd.table(db2006$migrant, db2006$country, weights=db2006$wgt),total=FALSE))
result2007 <- as.data.frame(cprop(wtd.table(db2007$migrant, db2007$country, weights=db2007$wgt),total=FALSE))

result2006<-rename (result2006, c(Freq = "y2006"))
result2007<-rename (result2007, c(Freq = "y2007"))

result <- merge(result2006, result2007, by = c("Var1","Var2"))

在我的真实数据库中,我有 10 年,所以多年来应用此代码需要时间。有谁知道更快的方法吗?

我还想按国家和年份计算女性和男性在移民身份中的比例。我正在寻找类似的东西:

Var1            Var2     Var3     y2006   y2007
Foreign born    France   men        52     55
Foreign born    France   women      48     45
Native          France   men        51     52
Native          France   women      49     48
Foreign born    UK       men        60     65
Foreign born    UK       women      40     35
Native          UK       men        48     50
Native          UK       women      52     50

有谁知道我怎样才能得到这些结果?

4

1 回答 1

1

您可以通过以下方式做到这一点: 使用您已经编写的代码创建一个函数;用于lapply在您的数据中迭代该函数的所有年份;然后使用Reduceandmerge将结果列表折叠到一个数据框中。像这样:

# let's make your code into a function called 'tallyho'
tallyho <- function(yr, data) {

  require(dplyr)
  require(questionr)

  DF <- filter(data, year == yr)

  result <- with(DF, as.data.frame(cprop(wtd.table(migrant, country, weights = wgt), total = FALSE)))

  # rename the last column by year
  names(result)[length(names(result))] <- sprintf("y%s", year)

  return(result)

}

# now iterate that function over all years in your original data set, then 
# use Reduce and merge to collapse the resulting list into a data frame
NewData <- lapply(unique(Data$year), function(x) tallyho(x, Data)) %>%
  Reduce(function(...) merge(..., all=T), .)
于 2016-10-19T12:19:05.293 回答