1

我有一个 .txt 文件(没有任何明确的列分隔符),其中每一行都包含格式为 %H-%m-%d %H:%M:%OS3 的时间戳(例如“2019-09-26 07 :29:22,778") 和一个事件字符串。我想读入数据并制作一个表格,其中在一列中显示完整的时间戳,在第二个中显示事件,在第三个中显示 OS3 时间格式的时间跨度(例如“1.230”或“1,230”秒) 在第 1 行的事件和第 2 行的事件之间,然后在第 1 行的事件和第 3 行的事件之间,等等。

在 Excel 中使用“[”作为分隔符并以 .tsv 格式保存后,我尝试读取文件,这是一个不令人满意的解决方法。然而,进一步使用 dplyr difftime 函数不会导致结果包含毫秒,尽管全局选项已设置为 3 位秒(“options(digits.secs=3)”)。

.txt 的样子:

2019-09-26 17:54:24,406 [218] INFO  - [1] - Event X
2019-09-26 17:54:24,431 [207] INFO  - [1] - Event Y
2019-09-26 17:54:24,438 [218] INFO  - [1] - Event Z
...
.
.

我想得到什么:

timestamp                   event            timediff in sec
2019-09-26 17:54:24,406     Event X
2019-09-26 17:54:24,431     Event Y          0.025
2019-09-26 17:54:24,438     Event Z          0.032
...
.
.
4

2 回答 2

1

干得好:

df <- data.table::fread(text = "2019-09-26 17:54:24,406 [218] INFO  - [1] - Event X
2019-09-26 17:54:24,431 [207] INFO  - [1] - Event Y
2019-09-26 17:54:24,438 [218] INFO  - [1] - Event Z", sep = "[", header = FALSE) # [ seems most convenient to use as sep
colnames(df) <- c("timestamp", "garbage", "event")

df
#>                  timestamp      garbage        event
#> 1: 2019-09-26 17:54:24,406 218] INFO  - 1] - Event X
#> 2: 2019-09-26 17:54:24,431 207] INFO  - 1] - Event Y
#> 3: 2019-09-26 17:54:24,438 218] INFO  - 1] - Event Z

library(dplyr)
library(stringr)


df_clean <- df %>% 
  select(-garbage) %>% 
  mutate(timestamp = str_replace(timestamp, ",", ".")) %>%  # comma must be replaced so milliseconds are recognised
  mutate(timestamp = as.POSIXct(timestamp, format = "%Y-%m-%d %H:%M:%OS"),
         event = str_extract(event, "Event.*"),
         start_time = min(timestamp), # adding the first timestamp as new column, could be removed later
         "timediff in sec" = as.numeric(timestamp - start_time, units = "secs")) # this converts difftime to numeric


df_clean
#>             timestamp   event          start_time timediff in sec
#> 1 2019-09-26 17:54:24 Event X 2019-09-26 17:54:24      0.00000000
#> 2 2019-09-26 17:54:24 Event Y 2019-09-26 17:54:24      0.02500010
#> 3 2019-09-26 17:54:24 Event Z 2019-09-26 17:54:24      0.03200006

reprex 包(v0.3.0)于 2019 年 10 月 10 日创建

于 2019-10-10T11:53:33.500 回答
1

您可以使用 [ 作为分隔符并使用read.delim. 3 位数字的问题是由于您使用逗号而不是点作为分隔符。这可以使用str_replace(或gsub)修复

library(dplyr)
library(stringr)

my_df <- read.delim(text = "
2019-09-26 17:54:24,406 [218] INFO  - [1] - Event X
2019-09-26 17:54:24,431 [207] INFO  - [1] - Event Y
2019-09-26 17:54:24,438 [218] INFO  - [1] - Event Z", 
sep = "[", header = FALSE, col.names = c("timestamp", "info", "event"))

my_df
#                 timestamp          info         event
# 1 2019-09-26 17:54:24,406  218] INFO  -  1] - Event X
# 2 2019-09-26 17:54:24,431  207] INFO  -  1] - Event Y
# 3 2019-09-26 17:54:24,438  218] INFO  -  1] - Event Z

my_df %>% 
  # drop the info column
  select(-info) %>% 
  mutate(# remove anything not related to the Event
         event = str_remove(event, ".*Event"), 
         # replace , with .
         timestamp = str_replace_all(timestamp, ",", "."),
         # transform to a proper timestamp
         timestamp = as.POSIXct(timestamp, format="%Y-%m-%d %H:%M:%OS"), 
         # calculate difftime (as proposed in your previous question [1])
         difftime = difftime(timestamp, timestamp[1], unit = 'sec'))
#                 timestamp event        difftime
# 1 2019-09-26 17:54:24.405     X 0.00000000 secs
# 2 2019-09-26 17:54:24.430     Y 0.02500010 secs
# 3 2019-09-26 17:54:24.437     Z 0.03200006 secs

[1]如何根据时间戳列制作时间跨度列?

于 2019-10-10T11:53:58.723 回答