string - R从半标准字符串中提取时间分量

Question

设置

我有一列持续时间作为字符串存储在数据框中。我想将它们转换为适当的时间对象，可能是POSIXlt。使用此方法可以轻松解析大多数字符串：

> data <- data.frame(time.string = c(
+   "1 d 2 h 3 m 4 s",
+   "10 d 20 h 30 m 40 s",
+   "--"))
> data$time.span <- strptime(data$time.string, "%j d %H h %M m %S s")
> data$time.span
[1] "2012-01-01 02:03:04" "2012-01-10 20:30:40" NA

缺少的持续时间已编码"--"并需要转换为NA- 这已经发生但应该保留。

挑战在于字符串会丢弃零值元素。因此，所需的值2012-01-01 02:00:14将是 string "1 d 2 h 14 s"。然而，这个字符串NA用简单的解析器解析：

> data2 <- data.frame(time.string = c(
+  "1 d 2 h 14 s",
+  "10 d 20 h 30 m 40 s",
+  "--"))
> data2$time.span <- strptime(data2$time.string, "%j d %H h %M m %S s")
> data2$time.span
[1] NA "2012-01-10 20:30:40" NA

问题

处理所有可能的字符串格式的“R Way”是什么？也许单独测试并提取每个元素，然后重新组合？
POSIXlt 是正确的目标类吗？我需要不受任何特定开始时间的持续时间，因此添加错误的年月数据 ( 2012-01-) 很麻烦。

解决方案

@mplourde 绝对有正确的想法，即基于测试日期格式中的各种条件来动态创建格式化字符串。添加cut(Sys.Date(), breaks='years')作为基线datediff也很好，但未能解决as.POSIXct() 注意中的一个关键怪癖：我使用的是 R2.11 基础，这可能已在以后的版本中修复。

as.POSIXct()根据是否包含日期组件，输出会发生巨大变化：

> x <- "1 d 1 h 14 m 1 s"
> y <-     "1 h 14 m 1 s"  # Same string, no date component
> format (x)  # as specified below
[1] "%j d %H h %M m %S s"
> format (y)
[1] "% H h % M %S s"    
> as.POSIXct(x,format=format)  # Including the date baselines at year start
[1] "2012-01-01 01:14:01 EST"
> as.POSIXct(y,format=format)  # Excluding the date baselines at today start
[1] "2012-06-26 01:14:01 EDT"

因此函数的第二个参数difftime应该是：

如果输入字符串具有日组件，则为当年第一天的开始
如果输入字符串没有日期组件，则为当前日期的开始

这可以通过更改函数的单位参数来实现cut：

parse.time <- function (x) {
  x <- as.character (x)
  break.unit <- ifelse(grepl("d",x),"years","days")  # chooses cut() unit
  format <- paste(c(if (grepl("d", x)) "%j d",
                    if (grepl("h", x)) "%H h",
                    if (grepl("m", x)) "%M m",
                    if (grepl("s", x)) "%S s"), collapse=" ")

  if (nchar(format) > 0) {
    difftime(as.POSIXct(x, format=format), 
             cut(Sys.Date(), breaks=break.unit),
             units="hours")
  } else {NA}

}

score 11 · Accepted Answer

difftime对象是可以添加到一个POSIXct或多个对象的持续时间POSIXlt对象。也许你想用这个代替POSIXlt？

关于从字符串到时间对象的转换，您可以执行以下操作：

data <- data.frame(time.string = c(
    "1 d 1 h",
    "30 m 10 s",
    "1 d 2 h 3 m 4 s",
    "2 h 3 m 4 s",
    "10 d 20 h 30 m 40 s",
    "--"))

f <- function(x) {
    x <- as.character(x)
    format <- paste(c(if (grepl('d', x)) '%j d',
                      if (grepl('h', x)) '%H h',
                      if (grepl('m', x)) '%M m',
                      if (grepl('s', x)) '%S s'), collapse=' ')

    if (nchar(format) > 0) {
        if (grepl('%j d', format)) {
            # '%j 1' is day 0. We add a day so that x = '1 d' means 24hrs.
            difftime(as.POSIXct(x, format=format) + as.difftime(1, units='days'), 
                    cut(Sys.Date(), breaks='years'),
                    units='hours')
        } else {
            as.difftime(x, format, units='hours')
        }
    } else { NA }
}

data$time.span <- sapply(data$time.string, FUN=f)

score 3 · Accepted Answer

我认为使用lubridate会有更好的运气：

使用lubridate轻松实现日期和时间：

5.3. 持续时间

...

持续时间的长度不受闰年、闰秒和夏令时的影响，因为持续时间以秒为单位。因此，持续时间具有一致的长度，并且可以很容易地与其他持续时间进行比较。持续时间是比较基于时间的属性（例如速度、速率和寿命）时使用的合适对象。lubridate 使用来自基础 R 的 difftime 类持续时间。已经创建了其他 difftime 方法来促进这一点。

lubridate 使用来自基础 R 的 difftime 类持续时间。已经创建了其他 difftime 方法来促进这一点。

...

Duration 对象可以使用辅助函数 dyears()、dweeks()、ddays()、dhours()、dminutes() 和 dseconds() 轻松创建。标题中的 d 代表持续时间，并将这些对象与第 5.4 节中讨论的周期对象区分开来。每个对象使用上面给出的估计关系创建以秒为单位的持续时间。

也就是说，我（还）没有找到将字符串解析为持续时间的函数。

您还可以查看Ruby 的 Chronic来了解时间解析的优雅程度。我还没有为 R 找到这样的库。

string - R从半标准字符串中提取时间分量

设置

问题

解决方案

2 回答 2

Related

Reference