我创建了一个脚本,它从两个 GitHub 存储库读取数据,重新格式化数据集,按行将它们绑定在一起,然后将所有内容写入一个新的 .csv 文件。然后,我通过cronR包的功能计划每小时运行一次该脚本。
这是我的代码:
devtools::install_github("tidyverse/googlesheets4")
library(dplyr)
library(googlesheets4)
library(RCurl)
setwd(dir = "YOUR_WORKING_DIRECTORY")
###############################################################################
#================== TIME SERIES DATA FOR CASES AND DEATHS ====================#
###############################################################################
# 1. #####==== DATASETS =====#####
# 1.1 ###= Cases #####
# These files are updated on GitHub every day.
cases <- read.csv(text = getURL(url = "https://raw.githubusercontent.com/openZH/covid_19/master/COVID19_Cases_Cantons_CH_total.csv"),
header = TRUE,
stringsAsFactors = FALSE,
na.strings = c("", "NA"),
encoding = "UTF-8")
# Removed data for whole Switzerland and Leichtenstein
cases <- subset(x = cases,
!is.element(el = canton,
set = c("CH", "FL")),
select = c("date",
"canton",
"tested_pos"))
names(cases)[1] <- "Date"
# Dataset restructured according to the cases dataset format
cases <- reshape(data = cases,
idvar = "Date",
timevar = "canton",
v.names = "tested_pos",
direction = "wide",
)
names(cases) <- gsub(pattern = "tested_pos.",
replacement = "",
x = names(cases))
cases[is.na(cases)] <- 0
cases <- cases[order(cases$Date,
decreasing = FALSE), ]
# More updated dataset
cases2 <- read.csv(text = getURL(url = "https://raw.githubusercontent.com/daenuprobst/covid19-cases-switzerland/master/covid19_cases_switzerland.csv"),
header = TRUE,
stringsAsFactors = FALSE,
na.strings = c("", "NA"),
encoding = "UTF-8")
# Remove total daily cases for Switzerland
cases2 <- subset(x = cases2,
select = -c(CH))
# rbind between two cases datasets
cases_tot <- bind_rows(cases[1:7, ],
cases2)
rownames(cases_tot) <- seq(from = 1,
to = nrow(cases_tot),
by = 1)
write.csv(x = cases_tot,
file = paste0(getwd(),
"/cases_tot.csv"),
row.names = FALSE,
quote = FALSE)
当我手动运行我的脚本时,一切正常,生成的 .csv 也很好,但是如果您尝试通过 cronR 包安排此脚本的运行(从 RStudio IDE 单击Addins -> Schedule R scripts on Linux/Unix)保存的 .csv 仅与“日期”列不同。事实上,第一个数据集的日期在第一列,但是第二个数据集的日期(绑定到第一个到bind_rows()
)在数据集的末尾,并且标题有一个新的奇怪名称(你可以从这张图片中看到)。
你知道可能是什么问题吗?非常感谢!
PS:我在 2016 年末的 MacBook Pro 上工作,8 Gb 的 RAM,安装了 macOS Catalina。