2

我正在尝试在具有目的地和出发地名称的数据集上执行该spread功能,tidyr以了解飞机旅程及其乘客人数。我尝试构建一个最终可用于热图的表格。因此,我希望将 Origin 变量放在行中,将 Destination 变量作为列。

我尝试使用不同的参数组合运行代码,也使用了,spread_但我总是以错误告终。

如果我使用spread_with key_coland val_col,我会得到:

匹配错误(x,表,nomatch = 0L):找不到对象“Destination.Region”

在我的大型数据集上,它会产生另一种类型的错误:

colnames<-( *tmp*, value = c("ASIA SUB-CONTINENT", "AUSTRALIA", : 'dimnames' [2] 的长度不等于数组范围) 中的错误

这是我第一次使用tidyr并且我正在了解这些软件包,这听起来并不太复杂。但是我几个小时以来一直在解决这个问题,在任何论坛上都找不到任何答案。

谢谢您的帮助,

以下是数据类型的示例:

data2<-matrix(NA, nrow = 7, ncol=3)  
colnames(data2)<-c("Origin.Destination", "Total.Passengers", "Destination.Region")
data2[,1] <- c("EAST AFRICA","SOUTHERN AFRICA","WEST AFRICA", "EAST AFRICA", "SOUTHERN AFRICA", "EAST AFRICA","EAST AFRICA")
data2[,2] <- c(100, 5000, 200, 10000, 200, 20, 4000)
data2[,3] <- c("WESTERN EUROPE", "SOUTH AMERICA", "ASIA", "SOUTH AMERICA", "ASIA", "WESTERN EUROPE", "WESTERN EUROPE")

数据2<-data.frame(数据2)

这是我的代码:

DF<- 
  data2 %>%
  spread_(key_ = "Destination.Region",
     value_ = "Total.Passengers", 
     convert = TRUE,
     drop = FALSE)
4

1 回答 1

0

这里有几件事可以尝试:

1)我会转换data2data.frame. 它使使用它变得更加容易。

data2<-matrix(NA, nrow = 7, ncol=3)  
colnames(data2)<-c("Origin.Destination", "Total.Passengers", "Destination.Region")
data2[,1] <- c("EAST AFRICA","SOUTHERN AFRICA","WEST AFRICA", "EAST AFRICA", "SOUTHERN AFRICA", "EAST AFRICA","EAST AFRICA")
data2[,2] <- c(100, 5000, 200, 10000, 200, 20, 4000)
data2[,3] <- c("WESTERN EUROPE", "SOUTH AMERICA", "ASIA", "SOUTH AMERICA", "ASIA", "WESTERN EUROPE", "WESTERN EUROPE")

data3<-data.frame(data2)

2) newdata.frame需要一个明确的列(通常是索引列)才能使spread_函数正常工作。否则:

DF<- 
  data3 %>%
  spread_(key_ = "Destination.Region",
          value_ = "Total.Passengers", 
          convert = TRUE,
          drop = FALSE)

Error: Duplicate identifiers for rows (1, 6, 7)

但如果:

data3$index<-1:nrow(data3)

DF<- 
  data3 %>%
  spread_(key_ = "Destination.Region",
          value_ = "Total.Passengers", 
          convert = TRUE,
          drop = FALSE)
DF

Origin.Destination index ASIA SOUTH AMERICA WESTERN EUROPE
1         EAST AFRICA     1   NA            NA            100
2         EAST AFRICA     2   NA            NA             NA
3         EAST AFRICA     3   NA            NA             NA
4         EAST AFRICA     4   NA         10000             NA
5         EAST AFRICA     5   NA            NA             NA
6         EAST AFRICA     6   NA            NA             20
7         EAST AFRICA     7   NA            NA           4000
8     SOUTHERN AFRICA     1   NA            NA             NA
9     SOUTHERN AFRICA     2   NA          5000             NA
10    SOUTHERN AFRICA     3   NA            NA             NA
11    SOUTHERN AFRICA     4   NA            NA             NA
12    SOUTHERN AFRICA     5  200            NA             NA
13    SOUTHERN AFRICA     6   NA            NA             NA
14    SOUTHERN AFRICA     7   NA            NA             NA
15        WEST AFRICA     1   NA            NA             NA
16        WEST AFRICA     2   NA            NA             NA
17        WEST AFRICA     3  200            NA             NA
18        WEST AFRICA     4   NA            NA             NA
19        WEST AFRICA     5   NA            NA             NA
20        WEST AFRICA     6   NA            NA             NA
21        WEST AFRICA     7   NA            NA             NA

在这里可能有意义的一件事是sum按出发地和目的地计算的总乘客数。这将避免使用索引并防止出现如此多的 NA:

Origin <- c("EAST AFRICA","SOUTHERN AFRICA","WEST AFRICA", "EAST AFRICA", "SOUTHERN AFRICA", "EAST AFRICA","EAST AFRICA")
Passengers <- c(100, 5000, 200, 10000, 200, 20, 4000)
Destination <- c("WESTERN EUROPE", "SOUTH AMERICA", "ASIA", "SOUTH AMERICA", "ASIA", "WESTERN EUROPE", "WESTERN EUROPE")
data3<-data.frame(Origin, Passengers, Destination)

DF<-data3 %>% group_by(Origin, Destination) %>%
  summarise(Total.Passengers = sum(Passengers)) %>%
  spread(Destination, Total.Passengers)

DF

          Origin  ASIA SOUTH AMERICA WESTERN EUROPE
          (fctr) (dbl)         (dbl)          (dbl)
1     EAST AFRICA    NA         10000           4120
2 SOUTHERN AFRICA   200          5000             NA
3     WEST AFRICA   200            NA             NA
于 2016-06-13T15:19:02.160 回答