我想确定包含指标的组。在下面的示例中,我想确定districts
包含county == 'other'
. 如果有county == 'other'
,district
那么我希望该区域中的每一行都有一个指示变量1
,否则。以下是使用,和0
进行的几次尝试,但均无效。也许我可以提取所有行,为该子集定义一个指标,然后将该子集与原始数据集合并回来,但我一直认为必须有一种更简单的方法。谢谢你的任何建议。split
lapply
any
county == 'other'
df.1 <- read.table(text = '
state district county apples
AA EC AB 100
AA EC BC 10
AA EC DC 150
AA C FG 200
AA C other 20
AA C HC 250
AA WC RT 300
AA WC TT 30
AA WC other 350
', header=TRUE, stringsAsFactors = FALSE)
desired.result <- read.table(text = '
state district county apples indicator
AA EC AB 100 0
AA EC BC 10 0
AA EC DC 150 0
AA C FG 200 1
AA C other 20 1
AA C HC 250 1
AA WC RT 300 1
AA WC TT 30 1
AA WC other 350 1
', header=TRUE, stringsAsFactors = FALSE)
# various attempts that do not work
with(df.1, lapply(split(county, district), function(x) {any(x)=='county' <- 1} ))
with(df.1, lapply(split(county, district), function(x) {ifelse(any(x)=='other', 1, 0)} ))
with(df.1, lapply(split(county, district), function(x) {any(x)=='other'} ))
with(df.1, lapply(split(df.1 , district), function(x) {any(x$county)=='other'} ))
with(df.1, lapply(split(county, district), function(x) {x=='other'} ))
编辑
这是我上面提到的子集/合并方法:
df.indicator <- df.1[df.1$county == 'other',]
df.indicator <- df.indicator[,1:2]
df.indicator$indicator = 1
merge(df.1, df.indicator, by=c('state', 'district'), all=TRUE)
我更喜欢使用基础 R。