0

我正在尝试按县提取整个美国的人口普查局数据。由于数据的大小,Census 要求您为数据导入指定“区域”(即州或县)。因此,我需要遍历所有状态的列表(通过 fips 代码)以获取所有导入的数据。我需要的输出是每个状态的单独数据帧,然后我可以使用这些数据帧并将其组合成一个大数据帧。这是我编写的代码示例:

library(censusapi)

states <- c("01","02")
for(i in 1:length(states)) {
   region = str_glue("state:{states[i]}")
   migr = str_glue("migr2010_{states[i]}")
   migr <- getCensus(name = "acs/flows", vintage = 2010,
                     key = "*myAPIkey*",
                     vars = c("MOVEDNET", "MOVEDIN", "MOVEDOUT", "AGE"),
                     region = "county:*", regionin = region)
}

我想要得到的是每个名为“migr2010_01”、“migr2010_02”等的状态的单独数据框。我实际上得到的是一个名为“migr”的数据框,其中只有列表中最后一个状态的数据。我知道我的循环中有问题,但我不确定我需要在哪里进行更改,因为我是 R 循环的新手。感谢您的任何想法。

4

3 回答 3

2

只需将您的过程变成一个函数并传递给命名列表(因为它输入一个字符向量)lapply或更好。sapply重新考虑保存类似的结构,并可能单独保存许多对象,但使用一个命名的数据帧列表。避免不必要地淹没全球环境:

library(stringr)
library(censusapi)

states <- c("01","02")

get_census_data <- function(st)
   region = str_glue("state:{st}")
   migr = str_glue("migr2010_{st}")

   migr <- getCensus(name = "acs/flows", vintage = 2010,
                     key = "*myAPIkey*",
                     vars = c("MOVEDNET", "MOVEDIN", "MOVEDOUT", "AGE"),
                     region = "county:*", regionin = region)
}

df_list <- sapply(states, get_census_data, simplify=FALSE)
# df_list <- setNames(lapply(states, get_census_data), states)   # EQUIVALENT CALL

如果将数据框存储在列表中而不是单独的对象中,则不会丢失数据框的功能:

str(df_list$`01`)
head(df_list$`01`)
summary(df_list$`01`)

dim(df_list$`02`)
tail(df_list$`02`)
table(df_list$`02`)
于 2018-07-11T17:39:19.660 回答
1

This is answered in part by FAQ 7.21. The most important part of that answer is the end where it says that it is easier to just use a list.

Your code can be converted to something like:

library(censusapi)
library(stringr)

states <- c("01","02")
migr.list <- lapply( states, function(x) {
   region = str_glue("state:{x}")
   migr = str_glue("migr2010_{x}")
   getCensus(name = "acs/flows", vintage = 2010,
                     key = "*myAPIkey*",
                     vars = c("MOVEDNET", "MOVEDIN", "MOVEDOUT", "AGE"),
                     region = "county:*", regionin = region)
})
names(migr.list) <- sprintf("migr2010_%s", states) # optional

Now migr.list will be a single list object with each element being the data frame returned by getCensus. If you want to combine these all together into 1 data frame you can use code like:

migr <- do.call(rbind, migr.list)

If you want to run the same code on each state separately then you can just use lapply or related functions. In the long run this will be much simpler and less error prone than using get and assign with loops.

于 2018-07-11T17:45:10.870 回答
0

您现有的代码创建一个名为 的对象migr,并为其分配一个字符串,其中包含您要创建的 data.frame 的名称。migr然后,您使用从人口普查中提取的 data.frame覆盖该对象。循环的每次迭代都会覆盖migr,这就是为什么只保存循环最后一次迭代的数据,然后只保存为名为 的 data.frame migr

相反,您需要使用assign命令将您从 Census 中提取的数据分配给存储在 中的值migr,如下所示:

library(censusapi)

states <- c("01","02")
for(i in 1:length(states)) {
   region = str_glue("state:{states[i]}")
   migr = str_glue("migr2010_{states[i]}")
   assign(
     x = migr,
     value = getCensus(name = "acs/flows", vintage = 2010,
                       key = "*myAPIkey*",
                       vars = c("MOVEDNET", "MOVEDIN", "MOVEDOUT", "AGE"),
                       region = "county:*", regionin = region)
   )
}

编辑

正如其他人所提到的,使用 data.frames 列表可能更容易,而不是在全局环境中创建多个。最简单的创建方法是使用lapply,如下所示:

 migr2010 <- lapply(
   paste0("state:", c("01", "02")),  # replaces region in the original
   getCensus,
   name = "acs/flows",
   vintage = 2010,
   key = "*myAPIkey*",
   vars = c("MOVEDNET", "MOVEDIN", "MOVEDOUT", "AGE"),
   region = "county:*"
   )

然后,如果您想从中创建一个 data.frame,您可以使用dplyr::bind_rows(migr2010), data.table::rbindlist(migr2010), or do.call(rbind, migr2010)(虽然do.call比其他两个慢得多)。

于 2018-07-11T17:06:07.150 回答