0

我有 3 个变量的数据 7320 ob​​s:年龄组和他们之间的联系电话。前任:

ageGroup ageGroup1  mij
0   0   0.012093847617507
0   1   0.00510485237464309
0   2   0.00374919082969427
0   3   0.00307241431437433
0   4   0.00254487083293498
0   5   0.00213734013959765
0   6   0.00182565778959543
0   7   0.00159036659169942
1   0   0.00475097494199872
1   1   0.00748329237103462
1   2   0.00427123298868537
1   3   0.00319622224196792
1   4   0.00287522072903812
1   5   0.00257773394696414
1   6   0.00230322568677366
1   7   0.00205265986733139

依此类推,直到 86。我必须计算smij之间的联系号码 ( )的平均值ageGroup,例如,ageGroup= 0 联系人与ageGroup1= 1mijageGroup= 1 联系人与ageGroup1= 0 和mij。我需要将这些值相加并除以 2 以获得两者之间的平均值。你会这么好心给我一个提示如何在所有数据中做到这一点吗?

4

2 回答 2

1

ddply从 plyr 包中使用(假设您的数据框是数据)

ddply(data,.(ageGroup,ageGroup1),summarize,sum.mij=sum(mij))

 ageGroup ageGroup1     sum.mij
1         0         0 0.012093848
2         0         1 0.005104852
3         0         2 0.003749191
4         0         3 0.003072414
5         0         4 0.002544871
6         0         5 0.002137340
7         0         6 0.001825658
8         0         7 0.001590367
9         1         0 0.004750975
10        1         1 0.007483292
11        1         2 0.004271233
12        1         3 0.003196222
13        1         4 0.002875221
14        1         5 0.002577734
15        1         6 0.002303226
16        1         7 0.002052660
于 2013-07-09T11:17:29.330 回答
0

我想我明白你在这里想要做什么。您想将两个 ageGroup 列之间的交互视为无方向性并获得平均交互吗?下面的代码应该使用基本 R 函数来执行此操作。

请注意,由于示例数据集被截断,它只会为索引为 01 的组提供正确答案。但是,如果您使用完整数据集运行,它应该适用于所有交互。

# Create the data frame
df=read.table(header=T,text="
ageGroup,ageGroup1,mij
0,0,0.012093848
0,1,0.005104852
0,2,0.003749191
0,3,0.003072414
0,4,0.002544871
0,5,0.00213734
0,6,0.001825658
0,7,0.001590367
1,0,0.004750975
1,1,0.007483292
1,2,0.004271233
1,3,0.003196222
1,4,0.002875221
1,5,0.002577734
1,6,0.002303226
1,7,0.00205266
",sep=",")
df

# Using the strSort function from this SO answer: 
# http://stackoverflow.com/questions/5904797/how-to-sort-letters-in-a-string-in-r
strSort <- function(x)
        sapply(lapply(strsplit(x, NULL), sort), paste, collapse="")

# Label each of the i-j interactions and j-i interactions with an index ij
# e.g. anything in ageGroup=1 interacting with ageGroup1=0 OR ageGroup=0 interacting with ageGroup1=1 
# are labelled with index 01 
df$ind=strSort(paste(df$ageGroup,df$ageGroup1,sep=""))

# Use the tapply function to get mean interactions for each group as suggested by Paul
tapply(df$mij,df$ind,mean)
于 2013-07-09T10:59:15.957 回答