好的,所以我有一个类似于此结构的 csv 文件
hashID,value,flag
98fafd, 35, 1
fh56w2, 25, 0
ggjeas, 55, 1
adfh5d, 45, 0
基本上我想要做的是获取 value 列的中位数,但只包括flag==1
计算中的行。
这在R中甚至可能吗?我四处寻找,没有找到类似的东西。
您还可以使用布尔数组在快速单行中执行此操作,以作为数据框的索引:
# read the data from a csv file
newdata <- read.csv("file.csv")
# this will give you a vector of boolean values of length nrow(newdata)
newdata$flag==1
# and this line uses the above vector to retrieve only those elements of
# newdata$value for which the row contains a flag value of 1
median(newdata$value[newdata$flag==1])
这是一种可能性:
使用以下命令读取您的数据集:
newdata <- read.csv("stackoverflow questions/mediancol.csv")
# I assume you have the data in csv format
# Showing the data I used for the computation
newdata <- structure(list(hashID = structure(c(1L, 3L, 4L, 2L), .Label = c("98fafd",
"adfh5d", "fh56w2", "ggjeas"), class = "factor"), value = c(35L,
25L, 55L, 45L), flag = c(1L, 0L, 1L, 0L)), .Names = c("hashID",
"value", "flag"), class = "data.frame", row.names = c(NA, -4L
))
> newdata
hashID value flag
1 98fafd 35 1
2 fh56w2 25 0
3 ggjeas 55 1
4 adfh5d 45 0
# Subset the data when flag =1
newdata1 <- subset(newdata,flag==1)
# Look at the summary of the data
> summary(newdata1)
hashID value flag
98fafd:1 Min. :35 Min. :1
adfh5d:0 1st Qu.:40 1st Qu.:1
fh56w2:0 Median :45 Median :1
ggjeas:1 Mean :45 Mean :1
3rd Qu.:50 3rd Qu.:1
Max. :55 Max. :1
# Only look at the median
median(newdata1$value)
[1] 45