r - lubridate - select first non-Monday of every week.

Question

Having a tibble of financial data, I would like to filter it by only selecting the first non-Monday of every week. Usually it will be a Tuesday, but sometimes it can be a Wednesday if Tuesday is a Holiday.

Here is my code that works in most cases

XLF <- quantmod::getSymbols("XLF", from = "2000-01-01", auto.assign = FALSE)

library(tibble)
library(lubridate)
library(dplyr)
xlf <- as_tibble(XLF) %>% rownames_to_column(var = "date") %>% 
         select(date, XLF.Adjusted)  
xlf$date <- ymd(xlf$date)

# We create Month, Week number and Days of the week columns
# Then we remove all the Mondays
xlf <- xlf %>% mutate(Year = year(date), Month = month(date), 
                      IsoWeek = isoweek(date), WDay = wday(date)) %>% 
               filter(WDay != 2)

# Creating another tibble just for ease of comparison
xlf2 <- xlf %>% 
          group_by(Year, IsoWeek) %>% 
          filter(row_number() == 1) %>% 
          ungroup()

That said, there are some issues that I have not been able to solve so far.

The issue is for instance that it is skipping "2002-12-31" which is a Tuesday because it is considered as part of the first ISO week of 2003. There are a few similar issues.
My question is how could I select of the first non-Monday of every week without such issues while staying in the tidyverse (ie. not having to use xts / zoo class)?

score 3 · Accepted Answer

您可以自己创建一个持续增加的周数。也许不是最优雅的解决方案，但对我来说效果很好。

as_tibble(XLF) %>% 
  rownames_to_column(var = "date")%>% 
  select(date, XLF.Adjusted)%>%
  mutate(date = ymd(date),
         Year = year(date),
         Month = month(date),
         WDay = wday(date),
         WDay_label = wday(date, label = T))%>% 
  # if the weekday number is higher in the line above or 
  # if the date in the previous line is more than 6 days ago
  # the week number should be incremented
  mutate(week_increment  = (WDay < lag(WDay) | difftime(date, lag(date), unit = 'days') > 6))%>%
  # the previous line causes the first element to be NA due to 
  # the fact that the lag function can't find a line above
  # we correct this here by setting the first element to TRUE
  mutate(week_increment = ifelse(row_number() == 1,
                                 TRUE,
                                 week_increment))%>%
  # we can sum the boolean elements in a cumulative way to get a week number
  mutate(week_number = cumsum(week_increment))%>%
  filter(WDay != 2)%>%
  group_by(Year, week_number) %>% 
  filter(row_number() == 1)

r - lubridate - select first non-Monday of every week.

1 回答 1

Related

Reference