1

我在看起来像这样的数据框中有不整洁的数据。

在这里,您可以在“团队”中看到一些足球队的名称。Name1-3 是变量,在第一列中列出了用于指代这些团队的不同名称。

               team             name1        name2      name3
1      Loughborough      Loughborough                        
2        Luton Town        Luton Town        Luton           
3      Macclesfield      Macclesfield                        
4  Maidstone United  Maidstone United                        
5   Manchester City   Manchester City     Man City           
6 Manchester United Manchester United Newton Heath Man United
7    Mansfield Town    Mansfield Town    Mansfield           
8      Merthyr Town      Merthyr Town                        

我的目标是将数据放入 2 列中,其中包含 team-name1、team-name2、team-name3 配对。我只想保留那些在 name1、name2 或 name3 中有数据的配对。

为此,我正在尝试 tidyr-gather()

temp <- dat %>% gather(key, value, 2:4) 
temp$key<-NULL
temp

这给出了以下输出:

                team             value
1       Loughborough      Loughborough
2         Luton Town        Luton Town
3       Macclesfield      Macclesfield
4   Maidstone United  Maidstone United
5    Manchester City   Manchester City
6  Manchester United Manchester United
7     Mansfield Town    Mansfield Town
8       Merthyr Town      Merthyr Town
9       Loughborough                  
10        Luton Town             Luton
11      Macclesfield                  
12  Maidstone United                  
13   Manchester City          Man City
14 Manchester United      Newton Heath
15    Mansfield Town         Mansfield
16      Merthyr Town                  
17      Loughborough                  
18        Luton Town                  
19      Macclesfield                  
20  Maidstone United                  
21   Manchester City                  
22 Manchester United        Man United
23    Mansfield Town                  
24      Merthyr Town                  

我尝试删除不完整的案例(例如第 20,21、23,24 行但不是 22 行),使用:

temp[complete.cases(temp),]

这不起作用,因为看似空的值观察包含一个字符“” - 我猜这就是gather()返回缺失数据的方式?我尝试转换temp$value为一个因子,但这也不起作用。

我很想听听如何摆脱不完整的案例。

样本数据...

dat<-structure(list(team = structure(1:8, .Label = c("Loughborough", 
"Luton Town", "Macclesfield", "Maidstone United", "Manchester City", 
"Manchester United", "Mansfield Town", "Merthyr Town"), class = "factor"), 
    name1 = structure(1:8, .Label = c("Loughborough", "Luton Town", 
    "Macclesfield", "Maidstone United", "Manchester City", "Manchester United", 
    "Mansfield Town", "Merthyr Town"), class = "factor"), name2 = structure(c(1L, 
    2L, 1L, 1L, 3L, 5L, 4L, 1L), .Label = c("", "Luton", "Man City", 
    "Mansfield", "Newton Heath"), class = "factor"), name3 = structure(c(1L, 
    1L, 1L, 1L, 1L, 2L, 1L, 1L), .Label = c("", "Man United"), class = "factor")), .Names = c("team", 
"name1", "name2", "name3"), row.names = c(NA, -8L), class = "data.frame")
4

3 回答 3

5

您还可以从包中添加filter(为了删除空白)和select(为了删除key列)dplyr并一次性获取所有内容

temp <- dat %>% 
  gather(key, value, 2:4) %>% 
  filter(value != "") %>%
  select(-key)

#                 team             value
# 1       Loughborough      Loughborough
# 2         Luton Town        Luton Town
# 3       Macclesfield      Macclesfield
# 4   Maidstone United  Maidstone United
# 5    Manchester City   Manchester City
# 6  Manchester United Manchester United
# 7     Mansfield Town    Mansfield Town
# 8       Merthyr Town      Merthyr Town
# 9         Luton Town             Luton
# 10   Manchester City          Man City
# 11 Manchester United      Newton Heath
# 12    Mansfield Town         Mansfield
# 13 Manchester United        Man United
于 2014-07-30T19:39:41.990 回答
2

您在寻找:temp[temp$value!='',]gather不要因为空字符串而受到指责,您的初始数据也有它们。您可以先替换它们,然后使用以下na.rm参数gather

dat[dat==''] <- NA
temp <- dat %>% gather(key, value, 2:4, na.rm=TRUE) 
temp$key<-NULL
tempA
于 2014-07-30T19:32:57.067 回答
1

类似的方法,但使用 na.omit:

dat %>% 
  gather(key, value, -team) %>% 
  select(-key) %>%
  mutate(value = ifelse(value == "", NA, value)) %>%
  na.omit %>%
  arrange(team)
于 2015-01-02T15:44:38.907 回答