您好,我是这个论坛的新手,并且在 R 中编码相对较新。我在构建频率表时遇到问题,当我的一些变量被链接时,该表填充零计数。我的测试数据包括 3 个植被属的植被计数,在 10 个地块中,有 2 种处理类型和两种子站点类型。我的每个地块只能是湿的或干的,治疗或控制不能两者兼而有之。
以下是我的数据的结构:
structure(list(SITE = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "NAKVAK", class = "factor"),
PLOT = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1A",
"1B", "2A", "2B", "3A", "3B", "4A", "4B", "5A", "5B"), class = "factor"),
PLOT2 = c(1L, 1L, 1L, 1L, 1L, 1L), SUBSITE = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("DRY", "WET"), class = "factor"),
TRTMT = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("CTL",
"OTC"), class = "factor"), YEAR = c(2010L, 2010L, 2010L,
2010L, 2010L, 2021L), GENUS = structure(c(3L, 1L, 2L, 2L,
3L, 1L), .Label = c("Betula", "Carex", "Cladonia"), class = "factor"),
LIFEFORM = structure(c(2L, 3L, 1L, 1L, 2L, 3L), .Label = c("FORB",
"LICHEN", "SDECI"), class = "factor"), ABUND = c(1L, 1L,
1L, 1L, 1L, 1L)), row.names = c(NA, -6L), class = c("tbl_df",
“tbl”,“data.frame”))
我正在尝试构建一个表格,其中包含 Plot ID、子站点(湿或干)、处理(OTC 或对照)、年份(2010 或 2021)、属和丰度计数。我也想包括零,所以我需要知道什么时候没有找到特定的属。我已经能够构建一个填充零计数的频率表,但问题是它还创建了我的数据集中不存在的变量组合。例如,地块 1A 始终位于经过 OTC 处理的干燥子站点中。然而,代码为地块 1A、湿子站点、OTC 和地块 1A、干子站点、控制等创建了一行...我正在寻找一种方法来链接我的地块、子站点和处理变量,以便丰度计数为零填写我的数据集中实际存在的绘图、子站点和处理的组合。
看起来它们应该是我最好的选择的功能是 tidyverse 的扩展和嵌套功能,但我愿意接受任何和所有建议!
**** 编辑以包含解决方案
vegdat <- as_tibble(read.csv("DummyDat100.csv"))
spec.sum <- vegdat %>%
group_by(PLOT, SUBSITE, TRTMT, GENUS, YEAR) %>%
summarise(count = sum(ABUND))
Spec.sum <- as.data.table(spec.sum)
#Create a new column that contains all site data
Spec.sum[, 'sampling' := paste(YEAR, PLOT, SUBSITE, TRTMT, sep = '_')]
#Now get rid of columns that were combined into sampling col
Spec.sum <- Spec.sum[, c('sampling', 'GENUS', 'count')]
#Now expand this smaller data.table to include zeroes for absent genera
Spec.sum <- as.data.table(complete(Spec.sum, sampling, GENUS, fill = list(count = 0)))
#Split that sampling col back into original columns
Spec.sum <- Spec.sum[, c('YEAR', 'PLOT', 'SUBSITE', 'TRTMT') := tstrsplit(sampling, '_', keep = c(1,2,3,4))]
#Remove the sampling column (if you want)
Spec.sum[, sampling := NULL]
#Reorder the data.table
Spec.sum.tidy <- Spec.sum[, c('PLOT', 'SUBSITE', 'TRTMT', 'GENUS', 'YEAR', 'count')]