我有这个数据框:
Source: local data frame [446,604 x 2]
date pressure
1 2014_01_01_0:01 991
2 2014_01_01_0:02 991
3 2014_01_01_0:03 991
4 2014_01_01_0:04 991
5 2014_01_01_0:05 991
6 2014_01_01_0:06 991
7 2014_01_01_0:07 991
8 2014_01_01_0:08 991
9 2014_01_01_0:09 991
10 2014_01_01_0:10 991
.. ... ...
我想使用separate()
from分隔日期列tidyr
library(tidyr)
separate(df, date, into = c("year", "month", "day", "time"), sep="_")
但它不起作用。我设法使用substr()
and来做到这一点mutate()
:
library(dplyr)
df %>%
mutate(
year = substr(date, 1, 4),
month = substr(date, 6, 7),
day = substr(date, 9, 10),
time = substr(date, 12, 15))
更新:
它不起作用,因为我的行格式不正确。我能够使用我的初始substr()
方法进行诊断,我发现我在数据框中有奇怪的条目:
df %>%
select(date) %>%
mutate(
year = substr(date, 1, 4),
month = substr(date, 6, 7),
day = substr(date, 9, 10),
time = substr(date, 12, 15)) %>%
group_by(year) %>%
summarise(n=n())
这就是我得到的:
Source: local data frame [33 x 2]
year n
1 2014 446293
2 4164 9
3 4165 10
4 4166 10
5 4167 10
6 4168 10
7 4169 10
8 4170 10
9 4171 10
10 4172 10
11 4173 10
12 4174 10
13 4175 10
14 4176 10
15 4177 10
16 4178 10
17 4179 10
18 4180 10
19 4181 10
20 4182 10
21 4183 10
22 4184 10
23 4185 10
24 4186 10
25 4187 10
26 4188 10
27 4189 10
28 4190 10
29 4191 10
30 4192 10
31 4193 11
32 4194 10
33 4195 1
是否有更有效的方法来诊断列元素的结构并在执行 separate() 之前找到格式错误的行?