我有一个数据框,显示一些关于个人的数据(ID
),他们居住的每一年都有一行。它还包含有关父母 ID ( P.ID
) 和出生时父母年龄 ( P.AB
) 的信息。
# Dataframe A: 1 row per individual
dfA <- data.frame(
"ID" = c("A", "B", "C", "D", "E"),
"P.ID" = c(NA, "A", "A", "B", "B"),
"P.AB" = c(NA, 3, 4, 2, 4),
"LS" = c(5, 6, 3, 4, 5))
# Dataframe B: 1 row per year of life
dfB <- data.frame("ID" = rep(dfA[,'ID'], dfA[,'LS']+1))
dfB <- merge(dfB, dfA, by = "ID")
dfB[ ,'AGE'] <- 0
for(i in 2:length(dfB[,1])){
if(dfB[i,'ID'] == dfB[i-1, 'ID']){
dfB[i,'AGE'] <- dfB[i-1, 'AGE'] + 1
}
}
给予:
> head(dfB)
ID P.ID P.AB LS AGE
1 A <NA> NA 5 0
2 A <NA> NA 5 1
3 A <NA> NA 5 2
4 A <NA> NA 5 3
5 A <NA> NA 5 4
6 A <NA> NA 5 5
然后我要做的是让 R 将“1”放入列REP
中以显示个人复制的年份。例如, B在A 3岁时出生于A,因此A 3 岁的那一行得到 1。我一直在尝试使用但努力使用多个标准来完成这项工作。一种解决方法是将and粘贴在一起(加上一个随机字符串以确保在我的较大数据集中没有错误的重复),但这感觉它缺乏优雅并且不必要地复杂。我想知道一个人可以/如何使用多个标准?%in%
ID
age
%in%
# Add 1 where an individual reproduced
dfB[,'REP'] <- 0
dfB[,'T1'] <- paste0(dfB[,'AGE'], "abcdefghijk656hjhjhj", dfB[,'ID'])
dfB[,'T2'] <- paste0(dfB[,'P.AB'], "abcdefghijk656hjhjhj", dfB[,'P.ID'])
dfB[,'REP'][dfB[,'T1'] %in% dfB[,'T2']] <- 1
dfB[,'T2'] <- dfB[,'T1'] <- NULL
dfB
输出将如下所示:
> dfB
ID P.ID P.AB LS AGE REP
1 A <NA> NA 5 0 0
2 A <NA> NA 5 1 0
3 A <NA> NA 5 2 0
4 A <NA> NA 5 3 1
5 A <NA> NA 5 4 1
6 A <NA> NA 5 5 0
7 B A 3 6 0 0
8 B A 3 6 1 0
9 B A 3 6 2 1
10 B A 3 6 3 0
11 B A 3 6 4 1
12 B A 3 6 5 0
13 B A 3 6 6 0
14 C A 4 3 0 0
15 C A 4 3 1 0
16 C A 4 3 2 0
17 C A 4 3 3 0
18 D B 2 4 0 0
19 D B 2 4 1 0
20 D B 2 4 2 0
21 D B 2 4 3 0
22 D B 2 4 4 0
23 E B 4 5 0 0
24 E B 4 5 1 0
25 E B 4 5 2 0
26 E B 4 5 3 0
27 E B 4 5 4 0
28 E B 4 5 5 0
我尝试了这个(和一些变体),它接近了,正确地将它们添加到正确的个体,但是在错误的年份 - 它看到A和B都繁殖,并且繁殖发生在 2、3 和 4 岁(6事件总数),但不是A和B都在 4 岁时繁殖,而A也在 3 岁时繁殖,B也在 2 岁时繁殖(总共 4 次事件):
dfB[,'REP'][dfB[,'P.ID'] %in% dfB[,'ID'] & dfB[,'P.AB'] %in% dfB[,'AGE']] <- 1
dfB[,'REP'][dfB[,'ID'] %in% dfB[,'P.ID'] & dfB[,'AGE'] %in% dfB[,'P.AB'] ] <- 1
作为对此的扩展,我想要每个年龄的后代数量,而不仅仅是 1 或 0,这是可行的(我更改dfA
为B和C是双胞胎),但也可能效率低下:
# Counts of offspring per year
dfA[,'PASTED'] <- paste0(dfA[,'P.ID'], "randomtext", dfA[,'P.AB'])
# Create rep column
dfB[,'REP'] <- 0
# Paste together ID and AGE columns to give unique row identifiers
dfB[,'T1'] <- paste0(dfB[,'AGE'], "randomtext", dfB[,'ID'])
dfB[,'T2'] <- paste0(dfB[,'P.AB'], "randomtext", dfB[,'P.ID'])
# Add Reps
dfB[,'REP'][dfB[,'T1'] %in% dfB[,'T2']] <- table(dfA[,'PASTED'])
# Remove excess columns
dfB[,'T2'] <- dfB[,'T1'] <- NULL