0

我有一个在单个月份期间为单个气象站创建 df 的过程。但是,我有大约 25 个站点想要获取超过 5 年的降水数据。

我在 df 中有站点 ID,如下表所示(但还有 23 个站点。

stationid           County
GHCND:USW00093721   ANNEARUNDEL
GHCND:USC00182308   BALTIMORE

通过以下代码获取天气数据集

library("rnoaa")
ANNEARUNDEL_2006 <- ncdc(datasetid='GHCND', stationid = "GHCND:USC00182060", datatypeid='PRCP', startdate = '2006-07-01', enddate = '2006-08-01', limit=400, token =  "API KEY") 

ANNEARUNDEL_2006 <- ANNEARUNDEL_2006$data

我熟悉适用于一个进程的非常基本的 for 循环。有没有办法设置这个循环将使用县名和年份在 2006 年到 2011 年期间为所有 25 个站点创建一个新的 df?循环是完成此任务的最佳方法吗?

4

3 回答 3

3

You could do something like this. Set up a function to read in the data, then loop through your df with mapply, and for each year with lapply. The output will be a named list of data (vectors as it stands, although you could capture more columns of df if you wanted, in which case they would be dataframes).

getNCDC <- function(id,County,year){
  df <- ncdc(datasetid='GHCND', stationid = id, datatypeid='PRCP', startdate = paste0(year,'-07-01'), enddate = paste0(year,'-08-01'), limit=400, token =  "API KEY") 
  df <- list(df$data)
  names(df) <- paste(County,year,sep="_")
  return(df)
}

allData <- lapply(2006:2011,function(year) mapply(getNCDC,df$stationid,df$County,year))
于 2017-04-12T15:32:26.773 回答
2

我喜欢这样的循环,因为它们更容易读写。你可以这样做有两个循环:

my_df <- read.table(text = "stationid   County
GHCND:USW00093721   ANNEARUNDEL
GHCND:USC00182308   BALTIMORE",
                    header = T)

library(rnoaa)

results <- list() # list as storage variable for the loop results
i <- 1 # indexing variable

for(sid in unique(my_df$stationid)) { # each station in your stationid dataframe
    for(year in 2006:2011) { # each year you care about
        data <- ncdc(datasetid='GHCND', stationid = sid,
                     datatypeid='PRCP', startdate = paste0(year, '-01-01'),
                     enddate = paste0(year, '-12-31'), limit=400, token = "API KEY")$data # subset the returned list right away here with $data

        # add info from each loop iteration
        data$county <- my_df[my_df$stationid == sid,]$County
        data$year <- year

results[[i]] <- data # store it
i <- i + 1 # rinse and repeat
    }
}
one_big_df <- do.call(rbind, results) # stack all of the data frames together rowwise

Of course, you could always adjust a for loop to using lapply or it's friends. If speed became an issue you might want to consider it.

于 2017-04-12T15:28:55.413 回答
2

The following solution uses funcitons from the rnoaa and tidyverse package.

Notice that I used the ghcnd_search to download the precipitation data.

# Load packages
library(rnoaa)
library(tidyverse)

# Create example data frame
sample_df <- data.frame(stationid = c("USW00093721", "USC00182308"),
                        County = c("ANNEARUNDEL", "BALTIMORE"),
                        stringsAsFactors = FALSE)

# Download the data use map. 
data_list <- map(sample_df$stationid, ghcnd_search, 
                 date_min = "2006-01-01", date_max = "2011-12-31", var = "prcp")

Now the prcp data from each station are downloaded as a data frame. They are all stroed in the data_list as a list.

You can access the data of each station by accessing the list, or you can convert the data in the list to a single data frame. Here is an example:

# Transpost the data_list. Turns a list-of-lists "inside-out"
data_list2 <- transpose(data_list)

# Combine all data to a single data frame
data_df <- bind_rows(data_list2$prcp)

Now all the data are in data_df as a data frame

于 2017-04-12T15:39:55.803 回答