0

我是 R 新手 - 我习惯使用 SAS。我有一个包含很多变量的数据集,其中三个变量是agesexagegroup。我正在尝试agesexagegroup变量中生成变量的汇总统计数据(平均值、中位数、Q1-Q3、sd)。即女性年龄的汇总统计数据(sex=0)在agegroup1 中,然后agegroup是 2 等,对于男性(性别 = 1)也是如此。

在 SAS 中,我会使用:

proc univariate data=mydata;  
var age;  
class agegroup;  
class sex;  
run;

这在 R 中会是什么?

另外,R中的SAS'等于什么npar1way?例如

proc npar1way data=mydata;  
where minutes ne 9;  
var minutes;  
class sex;  
run;`  

其中分钟不等于 9,因为 9 是缺失值。我如何在 R 中做到这一点?

4

3 回答 3

2
# In R, missing values are denoted by "NA" instead of the number 9.

# save this data in a text file 
age agegroup sex
1 agegroup1 male
2 agegroup2 female
3 agegroup3 male
5 agegroup1 female
7 agegroup2 male
8 agegroup3 female
1 agegroup3 male
2 agegroup2 female
3 agegroup1 male

# Set the working directory to the location of the data file using the function 
setwd("PATH OF THE DIRECTORY")

data <- read.table("data", header=TRUE, sep=" ")
data
data$sex <- factor(data$sex, levels = c('male', 'female'), ordered=TRUE)
data$agegroup <- factor(data$agegroup, levels = c('agegroup1', 'agegroup2', 'agegroup3'), ordered=TRUE)

# Know the structure of your data
str(data)

# Summary of the data
summary(data)

# Std. Dev. of the variable "age"
std.dev.age <- sd(data$age)
std.dev.age

# Summary of three variables in a table form
table(data)

# Plot a dodged bar chart with age ~ sex + agegroup
library("ggplot2")

ggplot(data = data, aes(x = sex, y = age, ymin=0, ymax=8, fill = agegroup)) + geom_bar(position="dodge", stat="identity", width=0.50) + scale_fill_manual(values=c("red", "green", "blue")) + labs (x = "", y= "age(years)",  fill=" ")
于 2012-09-23T15:14:42.810 回答
2

您可以使用aggregate函数 inR将数据拆分为子集,计算每个子集的汇总统计信息,并以方便的形式返回结果。

> age <- runif(100, 20, 60)
> sex <- sample(c(0, 1), 100, replace = T)
> agegroup <- sample(1:3, 100, replace = T)
# create some data

然后,您可以计算按sex和分组agegroup的子集的分位数

> aggregate(x=age, by=list(sex=sex, agegroup=agegroup), FUN="quantile")
  sex agegroup     x.0%    x.25%    x.50%    x.75%   x.100%
1   0        1 26.70523 31.75807 37.09244 46.49449 59.77582
2   1        1 20.68903 34.49182 45.66960 48.69480 54.90620
3   0        2 20.22123 33.22948 40.57074 47.32490 58.85273
4   1        2 23.50579 31.38165 35.69254 45.13376 50.68572
5   0        3 23.46469 29.72909 42.53047 46.93867 58.30279
6   1        3 20.64256 27.22600 39.70127 48.66251 59.61565

或计算平均值

> aggregate(x=age, by=list(sex=sex, agegroup=agegroup), FUN="mean")
  sex agegroup        x
1   0        1 39.95470
2   1        1 41.53341
3   0        2 40.53606
4   1        2 37.32189
5   0        3 40.68784
6   1        3 38.74829

对于您要为每个子集计算的标准差或方差或其他统计数据,类似。

于 2012-09-23T15:41:17.067 回答
1
# make some test data
age <- runif(100, 20, 60)
sex <- sample(c(0, 1), 100, replace = T)
agegroup <- sample(1:3, 100, replace = T)
test <- data.frame(age,sex,agegroup)

# define a new summary function to include the SD as well
# otherwise you will just get mean,median,min,max,Q1-Q3.
newsummary <- function(x) {c(summary(x),SD=sd(x))}

# get the summary stats by each agegroup/sex combo
by(test$age,test[c("sex","agegroup")],newsummary)

结果看起来像这样,这是一个列表格式的输出。

> by(test$age,test[c("sex","agegroup")],newsummary)
sex: 0
agegroup: 1
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.       SD 
22.07000 27.72000 38.36000 38.41000 48.02000 54.93000 11.50681 
------------------------------------------------------------ 
sex: 1
agegroup: 1
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.       SD 
24.36000 38.20000 44.96000 44.55000 52.95000 58.03000 10.70105 
------------------------------------------------------------ 
sex: 0
agegroup: 2
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.       SD 
21.52000 28.54000 36.75000 38.52000 49.45000 57.12000 12.26674 
------------------------------------------------------------ 
sex: 1
agegroup: 2
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.      SD 
20.0900 26.9900 31.7700 35.9800 44.6200 57.3500 11.9548 
------------------------------------------------------------ 
sex: 0
agegroup: 3
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.      SD 
20.5100 30.4300 39.6300 39.4100 47.4100 57.6000 11.9816 
------------------------------------------------------------ 
sex: 1
agegroup: 3
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.       SD 
20.04000 25.01000 36.03000 37.58000 47.81000 59.65000 13.14822 
于 2012-09-23T22:42:44.153 回答