4

我在执行迭代定义的计算时遇到困难。以下数据作为示例(实际数据集更大):

## DATA ##
# Columns
   Individual<-c("A","B","C","D","E","F","G","H1","H2","H3","H4","H5","K1","K2","K3","K4","K5")
   P1<-c(0,0,"A",0,"C","C",0, rep("E",5),"H1","H2","H3","H4","H5")
   P2<-c(0,0,"B",0,"D", "E",0,rep("G",5),"H1","H2","H3","H4","H5")
# Dataframe
   myd<-data.frame(Individual,P1,P2,stringsAsFactors=FALSE)


   Individual P1 P2
1           A  0  0
2           B  0  0
3           C  A  B
4           D  0  0
5           E  C  D
6           F  C  E
7           G  0  0
8          H1  E  G
9          H2  E  G
10         H3  E  G
11         H4  E  G
12         H5  E  G
13         K1 H1 H1
14         K2 H2 H2
15         K3 H3 H3
16         K4 H4 H4
17         K5 H5 H5

数据表示 和 个人与两个父母之间的关系,P1P2

所需的计算,标记为relationA,表示每个人与 A 的相关程度。

根据定义,A 和 A 之间的关系被赋值为 1。所有其他个体的值需要根据表中的信息计算,如下所示:

The value of relationA for an individual should be equal to 
   1/2 (the value of relationA of P1 of the individual)  
 + 1/2 (the value of relationA of P2 of the individual)

例如

  Individual P1 P2      relationA
1           A  0  0       1
2           B  0  0       0
3           C  A  B       (A = 1 + B = 0)/2 = 0.5
4           D  0  0       0
5           E  C  D       (C= 0.5 + D = 0)/2 = 0.25
6           F  C  E       (C = 0.5 + E = 0.25)/2 = 0.375  

预期的输出如下:

 Individual P1 P2  relationA
1           A  0  0   1
2           B  0  0   0
3           C  A  B   0.5
4           D  0  0   0
5           E  C  D   0.25
6           F  C  E   0.375
7           G  0  0   0 
8          H1  E  G   0.125
9          H2  E  G   0.125
10         H3  E  G   0.125
11         H4  E  G   0.125
12         H5  E  G   0.125
13         K1 H1 H1   0.125
14         K2 H2 H2   0.125
15         K3 H3 H3   0.125
16         K4 H4 H4   0.125
17         K5 H5 H5   0.125

我的困难在于以适当的方式表达这一点R。任何帮助,将不胜感激。

4

2 回答 2

4

您可以编写一个函数来计算给定个体的值和(隐式)关系作为简单的递归函数。

relationA <- function(ind) {
  if(ind == "A") {
    1
  } else if (ind == "0") {
    0
  } else {
    pts <- myd[myd$Individual == ind,]
    (relationA(pts[["P1"]]) + relationA(pts[["P2"]])) / 2
  }
}

简单地说,如果个人是A,它就是1;如果个人为0,则为0;对于其他任何事情,递归调用与个人相对应的relationA每个父母(P1P2)并将它们相加并除以2。这一次仅适用于一个人:

> relationA("A")
[1] 1
> relationA("F")
[1] 0.375
> relationA("K5")
[1] 0.125

但是您可以相对容易地将其矢量化到所有个体:

> sapply(myd$Individual, relationA)
    A     B     C     D     E     F     G    H1    H2    H3    H4    H5    K1 
1.000 0.000 0.500 0.000 0.250 0.375 0.000 0.125 0.125 0.125 0.125 0.125 0.125 
   K2    K3    K4    K5 
0.125 0.125 0.125 0.125 

这可以分配myd

myd$relationA <- sapply(myd$Individual, relationA)

这不是特别有效,因为它必须relationA为每种情况一遍又一遍地计算。当它到达“K5”时,它调用reationA("H5")了两次,每次调用relationA("E")and relationA("G"),而那些调用relationA("C"),和relationA("D"),等等。也就是说,没有结果被缓存,而是每次都重新计算。对于这么小的数据集,没关系,因为即使是低效的仍然非常快。relationA("0")relationA("0")

如果您想要/需要缓存结果并使用该缓存,那么您可以修改relationA以执行此操作。

relationAc <- function(ind) {
  pts <- myd[myd$Individual == ind,]
  if(nrow(pts) == 0 | any(is.na(pts[["relationA"]]))) {
    relationA <-
      if(ind == "A") {
        1
      } else if (ind == "0") {
        0
      } else {
        (relationAc(pts[["P1"]]) + relationAc(pts[["P2"]])) / 2
      }
    myd[myd$Individual == ind, "relationA"] <<- relationA
    relationA
  } else {
    pts[["relationA"]]
  }
}

然后你必须初始化缓存:

myd$relationA <- NA_real_

一次调用将填写所需的值,调用整个个人集将导致填写所有值。

> myd
   Individual P1 P2 relationA
1           A  0  0        NA
2           B  0  0        NA
3           C  A  B        NA
4           D  0  0        NA
5           E  C  D        NA
6           F  C  E        NA
7           G  0  0        NA
8          H1  E  G        NA
9          H2  E  G        NA
10         H3  E  G        NA
11         H4  E  G        NA
12         H5  E  G        NA
13         K1 H1 H1        NA
14         K2 H2 H2        NA
15         K3 H3 H3        NA
16         K4 H4 H4        NA
17         K5 H5 H5        NA
> relationAc("K5")
[1] 0.125
> myd
   Individual P1 P2 relationA
1           A  0  0     1.000
2           B  0  0     0.000
3           C  A  B     0.500
4           D  0  0     0.000
5           E  C  D     0.250
6           F  C  E        NA
7           G  0  0     0.000
8          H1  E  G        NA
9          H2  E  G        NA
10         H3  E  G        NA
11         H4  E  G        NA
12         H5  E  G     0.125
13         K1 H1 H1        NA
14         K2 H2 H2        NA
15         K3 H3 H3        NA
16         K4 H4 H4        NA
17         K5 H5 H5     0.125
> sapply(myd$Individual, relationAc)
    A     B     C     D     E     F     G    H1    H2    H3    H4    H5    K1 
1.000 0.000 0.500 0.000 0.250 0.375 0.000 0.125 0.125 0.125 0.125 0.125 0.125 
   K2    K3    K4    K5 
0.125 0.125 0.125 0.125 
> myd
   Individual P1 P2 relationA
1           A  0  0     1.000
2           B  0  0     0.000
3           C  A  B     0.500
4           D  0  0     0.000
5           E  C  D     0.250
6           F  C  E     0.375
7           G  0  0     0.000
8          H1  E  G     0.125
9          H2  E  G     0.125
10         H3  E  G     0.125
11         H4  E  G     0.125
12         H5  E  G     0.125
13         K1 H1 H1     0.125
14         K2 H2 H2     0.125
15         K3 H3 H3     0.125
16         K4 H4 H4     0.125
17         K5 H5 H5     0.125
于 2012-11-16T18:57:09.983 回答
3

编辑:

更简洁地说,您可以在一行代码中使用sapplyrowSums取消:for-loop

# Initialize values of relationA
myd$relationA <- 0
myd$relationA[myd$Individual=="A"] <- 1

# Calculate relationA
myd$relationA <-   myd$relationA + rowSums(sapply(myd$Individual, function(indiv) 
     myd$relationA[myd$Individual==indiv]/2 * ((myd$P1==indiv) + (myd$P2==indiv))))



你正在寻找这样的东西吗?

# Initialize values of relationA
myd$relationA <- 0
myd$relationA[myd$Individual=="A"] <- 1


# Iterate over all Individuals
for (indiv in myd$Individual) {

  indiVal <- myd$relationA[myd$Individual==indiv]

  # all columns handled at once, thanks to vectorization;  no need for myd$P1[i]
  myd$relationA <- myd$relationA  + 
                 indiVal/2 * ((myd$P1==indiv) + (myd$P2==indiv))
}

输出

myd
   Individual P1 P2 relationA
1           A  0  0     1.000
2           B  0  0     0.000
3           C  A  B     0.500
4           D  0  0     0.000
5           E  C  D     0.250
6           F  C  E     0.375
7           G  0  0     0.000
8          H1  E  G     0.125
9          H2  E  G     0.125
...
于 2012-11-16T17:47:48.617 回答