我在看起来像这样的数据框中有不整洁的数据。
在这里,您可以在“团队”中看到一些足球队的名称。Name1-3 是变量,在第一列中列出了用于指代这些团队的不同名称。
team name1 name2 name3
1 Loughborough Loughborough
2 Luton Town Luton Town Luton
3 Macclesfield Macclesfield
4 Maidstone United Maidstone United
5 Manchester City Manchester City Man City
6 Manchester United Manchester United Newton Heath Man United
7 Mansfield Town Mansfield Town Mansfield
8 Merthyr Town Merthyr Town
我的目标是将数据放入 2 列中,其中包含 team-name1、team-name2、team-name3 配对。我只想保留那些在 name1、name2 或 name3 中有数据的配对。
为此,我正在尝试 tidyr-gather()
temp <- dat %>% gather(key, value, 2:4)
temp$key<-NULL
temp
这给出了以下输出:
team value
1 Loughborough Loughborough
2 Luton Town Luton Town
3 Macclesfield Macclesfield
4 Maidstone United Maidstone United
5 Manchester City Manchester City
6 Manchester United Manchester United
7 Mansfield Town Mansfield Town
8 Merthyr Town Merthyr Town
9 Loughborough
10 Luton Town Luton
11 Macclesfield
12 Maidstone United
13 Manchester City Man City
14 Manchester United Newton Heath
15 Mansfield Town Mansfield
16 Merthyr Town
17 Loughborough
18 Luton Town
19 Macclesfield
20 Maidstone United
21 Manchester City
22 Manchester United Man United
23 Mansfield Town
24 Merthyr Town
我尝试删除不完整的案例(例如第 20,21、23,24 行但不是 22 行),使用:
temp[complete.cases(temp),]
这不起作用,因为看似空的值观察包含一个字符“” - 我猜这就是gather()
返回缺失数据的方式?我尝试转换temp$value
为一个因子,但这也不起作用。
我很想听听如何摆脱不完整的案例。
样本数据...
dat<-structure(list(team = structure(1:8, .Label = c("Loughborough",
"Luton Town", "Macclesfield", "Maidstone United", "Manchester City",
"Manchester United", "Mansfield Town", "Merthyr Town"), class = "factor"),
name1 = structure(1:8, .Label = c("Loughborough", "Luton Town",
"Macclesfield", "Maidstone United", "Manchester City", "Manchester United",
"Mansfield Town", "Merthyr Town"), class = "factor"), name2 = structure(c(1L,
2L, 1L, 1L, 3L, 5L, 4L, 1L), .Label = c("", "Luton", "Man City",
"Mansfield", "Newton Heath"), class = "factor"), name3 = structure(c(1L,
1L, 1L, 1L, 1L, 2L, 1L, 1L), .Label = c("", "Man United"), class = "factor")), .Names = c("team",
"name1", "name2", "name3"), row.names = c(NA, -8L), class = "data.frame")