2

我创建了一个脚本,它从两个 GitHub 存储库读取数据,重新格式化数据集,按行将它们绑定在一起,然后将所有内容写入一个新的 .csv 文件。然后,我通过cronR包的功能计划每小时运行一次该脚本。

这是我的代码:

devtools::install_github("tidyverse/googlesheets4")

library(dplyr)
library(googlesheets4)
library(RCurl)

setwd(dir = "YOUR_WORKING_DIRECTORY")

###############################################################################
#================== TIME SERIES DATA FOR CASES AND DEATHS ====================#
###############################################################################

# 1. #####==== DATASETS =====#####

# 1.1 ###= Cases #####

# These files are updated on GitHub every day.
cases <- read.csv(text = getURL(url = "https://raw.githubusercontent.com/openZH/covid_19/master/COVID19_Cases_Cantons_CH_total.csv"),
                  header = TRUE,
                  stringsAsFactors = FALSE,
                  na.strings = c("", "NA"),
                  encoding = "UTF-8")

# Removed data for whole Switzerland and Leichtenstein
cases <- subset(x = cases,
                !is.element(el = canton,
                            set = c("CH", "FL")),
                select = c("date",
                           "canton",
                           "tested_pos"))

names(cases)[1] <- "Date"

# Dataset restructured according to the cases dataset format
cases <- reshape(data = cases,
                 idvar = "Date",
                 timevar = "canton",
                 v.names = "tested_pos",
                 direction = "wide",
                 )

names(cases) <- gsub(pattern = "tested_pos.",
                     replacement = "",
                     x = names(cases))

cases[is.na(cases)] <- 0

cases <- cases[order(cases$Date,
                     decreasing = FALSE), ]

# More updated dataset
cases2 <- read.csv(text = getURL(url = "https://raw.githubusercontent.com/daenuprobst/covid19-cases-switzerland/master/covid19_cases_switzerland.csv"),
                   header = TRUE,
                   stringsAsFactors = FALSE,
                   na.strings = c("", "NA"),
                   encoding = "UTF-8")

# Remove total daily cases for Switzerland
cases2 <- subset(x = cases2,
                 select = -c(CH))

# rbind between two cases datasets
cases_tot <- bind_rows(cases[1:7, ],
                       cases2)

rownames(cases_tot) <- seq(from = 1,
                           to = nrow(cases_tot),
                           by = 1)

write.csv(x = cases_tot,
          file = paste0(getwd(),
                        "/cases_tot.csv"),
          row.names = FALSE,
          quote = FALSE)

当我手动运行我的脚本时,一切正常,生成的 .csv 也很好,但是如果您尝试通过 cronR 包安排此脚本的运行(从 RStudio IDE 单击Addins -> Schedule R scripts on Linux/Unix)保存的 .csv 仅与“日期”列不同。事实上,第一个数据集的日期在第一列,但是第二个数据集的日期(绑定到第一个到bind_rows())在数据集的末尾,并且标题有一个新的奇怪名称(你可以从这张图片中看到)。

你知道可能是什么问题吗?非常感谢!

PS:我在 2016 年末的 MacBook Pro 上工作,8 Gb 的 RAM,安装了 macOS Catalina。

4

0 回答 0