0

我在使用 Pad 函数 (Padr) 填补时间序列中的空白时遇到问题。我有一些代码可以从服务器下载每小时数据,在特定时间段内一次一天。每天下载完数据后,目的是使用pad清理数据并添加时间和日期,以便可以适当组合而不会出错。

该函数将数据下载为列表,如下所示:

 time                  temperature
2019-11-11 00:00:00          3
2019-11-11 01:00:00          4 
2019-11-11 03:00:00          5

想要一个程序自动填写如下:

 time                  temperature
2019-11-11 00:00:00          3
2019-11-11 01:00:00          4 
2019-11-11 02:00:00          NA
2019-11-11 03:00:00          5

我在下面的代码中使用了 PAD 来填补空白,但如果数据从 02:00:00 开始,它就会从那个时间步开始。使用 start_val 和 end_val 时,识别日期和时间似乎有问题,任何帮助将不胜感激。我已经尝试了很多解决方法,但没有运气。请记住日期会有所不同,并且无法知道错过了哪个小时。

    if (nrow(daily$hourly) < 24) {
    daily$hourly <- daily$hourly %>% pad(daily$hourly$time, start_val = as.POSIXct('00:00:00'),end_val = as.POSIXct('23:00:00') %>% fill_by_value(value)
  }

**更新

我认为主要问题是 R 没有认识到 00:00:00 是时间序列的开始,因此它不会填补 01:00:00 作为空白。如果差距在不同的地方,这两种解决方案都有效。有什么想法吗。请参阅下面的结构。

structure(list(time = structure(c(1521936000, 1521939600, 1521943200, 
1521946800, 1521950400, 1521954000, 1521957600, 1521961200, 1521964800, 
1521968400, 1521972000, 1521975600, 1521979200, 1521982800, 1521986400, 
1521990000, 1521993600, 1521997200, 1522000800, 1522004400, 1522008000, 
1522011600, 1522015200), class = c("POSIXct", "POSIXt"), tzone = ""), 
    summary = c("Overcast", "Overcast", "Overcast", "Overcast", 
    "Overcast", "Overcast", "Overcast", "Foggy", "Mostly Cloudy", 
    "Mostly Cloudy", "Overcast", "Mostly Cloudy", "Mostly Cloudy", 
    "Mostly Cloudy", "Mostly Cloudy", "Mostly Cloudy", "Partly Cloudy", 
    "Partly Cloudy", "Partly Cloudy", "Partly Cloudy", "Partly Cloudy", 
    "Clear", "Clear"), icon = c("cloudy", "cloudy", "cloudy", 
    "cloudy", "cloudy", "cloudy", "cloudy", "fog", "partly-cloudy-day", 
    "partly-cloudy-day", "cloudy", "partly-cloudy-day", "partly-cloudy-day", 
    "partly-cloudy-day", "partly-cloudy-day", "partly-cloudy-day", 
    "partly-cloudy-day", "partly-cloudy-day", "partly-cloudy-day", 
    "partly-cloudy-night", "partly-cloudy-night", "clear-night", 
    "clear-night"), precipIntensity = c(0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L), precipProbability = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L), temperature = c(7.28, 7.3, 7.21, 7.08, 7.03, 7.02, 7.15, 
    7.19, 7.38, 7.83, 8.43, 9.35, 9.89, 10.54, 10.81, 11.07, 
    11.55, 11.31, 10.52, 9.67, 8.67, 7.94, 6.93), apparentTemperature = c(7.28, 
    7.3, 7.21, 7.08, 7.03, 7.02, 7.15, 7.19, 7.38, 7.33, 8.43, 
    9.35, 9.64, 10.54, 10.81, 11.07, 11.55, 11.31, 10.52, 9.67, 
    8.67, 7.94, 6.93), dewPoint = c(4.99, 5.07, 5.03, 4.99, 4.86, 
    5.04, 5.41, 5.6, 5.55, 5.62, 5.57, 5.79, 5.84, 5.7, 5.4, 
    5.08, 4.4, 4.2, 4.37, 4.32, 4.02, 4.06, 3.73), humidity = c(0.85, 
    0.86, 0.86, 0.87, 0.86, 0.87, 0.89, 0.9, 0.88, 0.86, 0.82, 
    0.78, 0.76, 0.72, 0.69, 0.67, 0.61, 0.62, 0.66, 0.69, 0.73, 
    0.76, 0.8), pressure = c(1005.4, 1005.7, 1006, 1006.4, 1006.7, 
    1007.2, 1007.7, 1008.6, 1009.4, 1010.3, 1010.9, 1011.6, 1011.7, 
    1012.1, 1012.2, 1012.3, 1012.4, 1012.6, 1013.3, 1013.8, 1014.5, 
    1014.8, 1015.3), windSpeed = c(0.35, 0.48, 0.55, 0.33, 0.36, 
    0.6, 0.85, 1.05, 1.29, 1.38, 0.89, 1.33, 1.39, 1.44, 1.63, 
    1.57, 1.46, 1.27, 0.57, 0.23, 0.03, 0.27, 0.2), windGust = c(0.48, 
    0.81, 0.95, 0.42, 0.44, 0.96, 1.14, 1.28, 2.03, 1.99, 1.72, 
    2.51, 2.48, 2.66, 2.48, 2.46, 2.42, 1.67, 0.65, 0.27, 0.03, 
    0.27, 0.2), windBearing = c(28L, 6L, 12L, 1L, 12L, 3L, 12L, 
    23L, 40L, 41L, 26L, 22L, 15L, 21L, 9L, 11L, 10L, 18L, 16L, 
    17L, NA, 273L, 284L), cloudCover = c(0.98, 0.98, 0.98, 0.93, 
    0.89, 0.93, 0.97, 0.94, 0.82, 0.83, 0.99, 0.75, 0.75, 0.75, 
    0.75, 0.73, 0.51, 0.49, 0.46, 0.46, 0.44, 0.1, 0), uvIndex = c(0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 
    1L, 0L, 0L, 0L, 0L, 0L, 0L), visibility = c(6.74, 6.064, 
    6.532, 6.035, 6.054, 6.006, 4.033, 3.047, 4.369, 5.512, 6.856, 
    8.129, 9.269, 9.488, 10.003, 10.003, 10.003, 10.003, 10.003, 
    10.003, 10.003, 10.003, 9.521)), row.names = c(NA, -23L), class = "data.frame")
4

2 回答 2

1

您可以使用completefrom并在和tidyr之间创建每小时序列minmax time

tidyr::complete(df, time = seq(min(time), max(time), by = "1 hour"))

#  time                temperature
#  <dttm>                    <int>
#1 2019-11-11 00:00:00           3
#2 2019-11-11 01:00:00           4
#3 2019-11-11 02:00:00          NA
#4 2019-11-11 03:00:00           5

数据

df <- structure(list(time = structure(c(1573401600, 1573405200, 1573412400
), class = c("POSIXct", "POSIXt"), tzone = ""), temperature = 3:5), 
row.names = c(NA, -3L), class = "data.frame")
于 2019-11-09T11:01:00.893 回答
1

padr::pad将 dataframes 作为其第一个参数,因此它不适用于您现在提供的向量。您需要做的就是:

x <- data.frame(
  time = as.POSIXct(c('2019-11-11 00:00:00','2019-11-11 01:00:00','2019-11-11 03:00:00')),
  temperature = 3:5
)
padr::pad(x)
于 2019-11-12T12:35:04.813 回答