0

我已经从 Big Query 下载了一个 JSON 提取,其中包含嵌套和重复的字段(类似于包bigrquery),并且正在尝试进一步操作生成的 tibble。

我有以下代码要从 JSON 加载并转换为 tibble

library(tidyverse)
ga.list <- lapply(readLines("temp.json"), jsonlite::fromJSON, flatten = TRUE)
ga.df <- tibble(dat = ga.list) %>% 
    unnest_wider(dat) %>% 
    mutate(id = row_number()) %>%
         unnest_wider(b_nested) %>%
         unnest_wider(b3) %>% 
         unnest_wider(b33)

所以有两个列表列:

  1. b_nested,此列是嵌套列表(我递归地取消嵌套了..也许有更自动化的方式,如果有,请指教!
  2. rr1 和 rr2,这些列将始终具有相同数量的元素。所以 rr1 和 rr2 的元素 1 应该一起读。

我仍在研究如何提取 id、rr1 和 rr2 并将其制成一个长表,其中每个 id 行都有重复的行。

注意:随着我的进展,这个问题已经被编辑了几次..最初我一直坚持将它从 JSON 转换为 tibble,直到我发现unnest_wider()

temp.json

{"a":"4000","b_nested":{"b1":"(未设置)","b2":"一些 - 文本","b3":{"b31":"1591558980","b32 ":"60259425255","b33":{"b3311":"133997175"},"b4":false},"b5":true},"rr1":[],"rr2":[]} {" a":"4000","b_nested":{"b1":"asdfasdfa","b2":"some - text more","b3":{"b31":"11111","b32":"2222 ","b33":{"b3311":"3333333"},"b4":true},"b5":true}, "rr1":["v1","v2","v3"],"rr2 ":["x1","x2","x3"]} {"a":"6000","b_nested":{"b1":"asdfasdfa","b2":"some - text more","b3":{"b31":"11111","b32":"2222","b33":{" b3311":"3333333"},"b4":true},"b5":true},"rr1":["v1","v2","v3","v4","v5"],"rr2 ":["aja1","aja2","aja3","aja14","aja5"]}rr2":["aja1","aja2","aja3","aja14","aja5"]}rr2":["aja1","aja2","aja3","aja14","aja5"]}

4

1 回答 1

1

拼图的最后一块;为了获得重复记录的重复行

  ga.df %>% select(id, rr1, rr2)  %>%
  unnest(cols = c(rr1, rr2))

仅供参考:链接到大查询指定嵌套和重复的列

另一种解决方案(我的偏好)是从 rr1 和 rr1 创建一个 tibble 并保留为 ga.df 中的列,以便可以使用purrr函数

    ga.df %>% 
  mutate(rr = map2(rr1, rr2, function(x,y) {
    tibble(rr1 = x, rr2 = y)
  })) %>%
  select(-rr1, -rr2) %>%
  mutate(rr_length = map_int(rr, ~nrow(.x)))
于 2020-04-14T12:11:10.003 回答