1

我不明白“lubridate”库中的“ymd”函数如何在 R 中工作。我正在尝试构建一个无需指定格式即可正确转换日期的功能。我正在检查由于 dmy()、mdy() 和 ymd() 函数而出现的最小 NA 数量。

所以 ymd() 有时会给出 NA ,有时会给出相同的 Date 值。R中是否还有其他功能或包,这将帮助我解决这个问题。

> data$DTTM[1:5]
[1] "4-Sep-06"  "27-Oct-06" "8-Jan-07"  "28-Jan-07" "5-Jan-07" 

> ymd(data$DTTM[1])
[1] NA
Warning message:
All formats failed to parse. No formats found. 
> ymd(data$DTTM[2])
[1] "2027-10-06 UTC"
> ymd(data$DTTM[3])
[1] NA
Warning message:
All formats failed to parse. No formats found. 
> ymd(data$DTTM[4])
[1] "2028-01-07 UTC"
> ymd(data$DTTM[5])
[1] NA
Warning message:
All formats failed to parse. No formats found. 
> 

> ymd(data$DTTM[1:5])
[1] "2004-09-06 UTC" "2027-10-06 UTC" "2008-01-07 UTC" "2028-01-07 UTC"
[5] "2005-01-07 UTC"

谢谢

4

3 回答 3

6

@user1317221_G 已经指出您的日期采用日-月-年格式,这表明您应该使用dmy而不是ymd. 此外,由于您的月份采用%b格式(“当前语言环境中的缩写月份名称”;请参阅?strptime),您的问题可能与您的locale. 您拥有的月份名称似乎是英文的,这可能与您当前使用的语言环境中的拼写方式不同。

让我们看看当我尝试我dmy的日期时会发生什么locale

date_english <- c("4-Sep-06",  "27-Oct-06", "8-Jan-07",  "28-Jan-07", "5-Jan-07")
dmy(date_english)

# [1] "2006-09-04 UTC" NA               "2007-01-08 UTC" "2007-01-28 UTC" "2007-01-05 UTC"
# Warning message:
#  1 failed to parse.

“2006 年 10 月 27 日”无法解析。让我们检查一下我的时间locale

Sys.getlocale("LC_TIME")
# [1] "Norwegian (Bokmål)_Norway.1252"

dmy 在我的语言环境中不将“oct”识别为有效%b月份。

处理此问题的一种方法是将“oct”更改为相应的挪威语缩写“okt”:

date_nor <- c("4-Sep-06",  "27-Okt-06", "8-Jan-07",  "28-Jan-07", "5-Jan-07" )
dmy(date_nor)
# [1] "2006-09-04 UTC" "2006-10-27 UTC" "2007-01-08 UTC" "2007-01-28 UTC" "2007-01-05 UTC"

另一种可能性是使用原始日期(即在其原始“语言环境”中),并将locale参数设置为dmy. 具体如何做到这一点取决于平台(请参阅?locales。这是我在 Windows 中的做法:

dmy(date_english, locale = "English")
[1] "2006-09-04 UTC" "2006-10-27 UTC" "2007-01-08 UTC" "2007-01-28 UTC" "2007-01-05 UTC"
于 2014-04-10T11:16:55.803 回答
1

使用 lubridate 包中的guess_formats 函数将最接近您所追求的。

library(lubridate)
x <- c("4-Sep-06", "27-Oct-06","8-Jan-07" ,"28-Jan-07","5-Jan-2007")
format <- guess_formats(x, c("mdY", "BdY", "Bdy", "bdY", "bdy", "mdy", "dby"))
strptime(x, format)

高温高压

于 2014-04-10T12:04:44.653 回答
0

from the documentation on ymd on page 70

As long as the order of formats is correct, these functions will parse dates correctly even when the input vectors contain differently formatted dates

ymd() expects year-month-day, you have day-month-year

x <- c("2009-01-01", "2009-01-02", "2009-01-03")
ymd(x)

maybe you need something like

y <- c("4-Sep-06",  "27-Oct-06", "8-Jan-07",  "28-Jan-07", "5-Jan-07" )
as.POSIXct(y, format = "%d-%b-%y")

PS the reason I think you get NAs for some is that you only have a single digit for year and ymd doesn't know what to do with that, but it works when you have two digits for year e.g. "27-Oct-06" "28-Jan-07" but fails for "5-Jan-07" etc

于 2014-04-10T10:46:27.250 回答