稍后添加:对不起@alistaire,在发布此回复后才看到您对原始帖子的评论。
据我了解Error: Duplicate identifiers for rows...
,当您具有具有相同标识符的值时会发生这种情况。例如,在原始“iris”数据集中,Species = setosa的前五行Petal.Width均为 0.2,三行的Petal.Length
值为 1.4。收集这些数据不是问题,但是当您尝试传播它们时,该函数不知道什么属于什么。即 0.2 Petal.Width和 1.4 Petal.Length属于setosa的哪一行。
我在这些情况下使用的(tidyverse)解决方案是在收集阶段为每行数据创建一个唯一标记,以便该函数可以在您想要再次传播时跟踪哪些重复数据属于哪些行。请参见下面的示例:
# Load packages
library(dplyr)
library(tidyr)
# Get data
dataset <- iris
# View dataset
head(dataset)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5.0 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
# Gather data
dataset_gathered <- dataset %>%
# Create a unique identifier for each row
mutate(marker = row_number(Species)) %>%
# Gather the data
gather(key = Type, value = Values, 1:4)
# View gathered data
head(dataset_gathered)
#> Species marker Type Values
#> 1 setosa 1 Sepal.Length 5.1
#> 2 setosa 2 Sepal.Length 4.9
#> 3 setosa 3 Sepal.Length 4.7
#> 4 setosa 4 Sepal.Length 4.6
#> 5 setosa 5 Sepal.Length 5.0
#> 6 setosa 6 Sepal.Length 5.4
# Spread it out again
dataset_spread <- dataset_gathered %>%
# Group the data by the marker
group_by(marker) %>%
# Spread it out again
spread(key = Type, value = Values) %>%
# Not essential, but remove marker
ungroup() %>%
select(-marker)
# View spread data
head(dataset_spread)
#> # A tibble: 6 x 5
#> Species Petal.Length Petal.Width Sepal.Length Sepal.Width
#> <fctr> <dbl> <dbl> <dbl> <dbl>
#> 1 setosa 1.4 0.2 5.1 3.5
#> 2 setosa 1.4 0.2 4.9 3.0
#> 3 setosa 1.3 0.2 4.7 3.2
#> 4 setosa 1.5 0.2 4.6 3.1
#> 5 setosa 1.4 0.2 5.0 3.6
#> 6 setosa 1.7 0.4 5.4 3.9
(和以往一样,感谢 Jenny Bryan 的reprex
包裹)