r - R：使用 dplyr 链接 2 个对象（Github stats API 示例）

Question

这是存储库的 github stats api 数据的结构。我正在使用 dplyr 和 tidy_json 库来列出存储库中每个用户的提交数（“c”）、删除（“d”）、添加的代码行数（“a”）和相应的周数（“w”） .

      {
        "total": 5,
        "weeks": [
          {
            "w": 1428192000,
            "a": 0,
            "d": 0,
            "c": 0
          },
          {
            "w": 1428796800,
            "a": 0,
            "d": 0,
            "c": 0
          }
        ],
        "author": {
          "login": "ttuser1234",
          "id": 111111111
        }
      },
      {
        "total": 18,
        "weeks": [    
          {
            "w": 1428192000,
            "a": 212,
            "d": 79,
            "c": 5
          },
          {
            "w": 1428796800,
            "a": 146,
            "d": 67,
            "c": 1
          }
        ],
        "author": {
          "login": "coder1234",
          "id": 22222222
        }
      }
}

我可以分别提取周数和作者数据，但无法将它们连接在一起。

inp_file=read_json("The JSON file")
dat=as.tbl_json(inp_file)
dat%>%
  enter_object("weeks") %>%
  gather_array %>%
  spread_values(week=jstring("w"),add=jstring("a"),del=jstring("d"),comm=jstring("c"))


enter_object("author") %>%
  spread_values(handle=jstring("login"))

在任何时候我都无法从作者对象跳转到周对象来链接它们中的 2 个。有什么办法可以做到这一点吗？感谢任何帮助。

score 0 · Accepted Answer

解决方案tidyjson。看起来你的 JSON 有点麻烦，它可能应该是一个数组？固定版本如下。

使用来自的开发版本devtools::install_github('jeremystan/tidyjson')

在任何情况下，都没有必要enter_object为这两个对象。相反，您可以在输入周对象之前使用更复杂的路径来获取作者的句柄。

    json <- '[
    {
        "total": 5,
        "weeks": [
            {
                "w": 1428192000,
                "a": 0,
                "d": 0,
                "c": 0
            },
            {
                "w": 1428796800,
                "a": 0,
                "d": 0,
                "c": 0
            }
        ],
        "author": {
            "login": "ttuser1234",
            "id": 111111111
        }
    },
    {
        "total": 18,
        "weeks": [
            {
                "w": 1428192000,
                "a": 212,
                "d": 79,
                "c": 5
            },
            {
                "w": 1428796800,
                "a": 146,
                "d": 67,
                "c": 1
            }
        ],
        "author": {
            "login": "coder1234",
            "id": 22222222
        }
    }
]'

  json %>% as.tbl_json %>%
    gather_array() %>%
    spread_values(handle=jstring('author','login')) %>% ## useful tip
    enter_object("weeks") %>%
    gather_array %>%
    spread_values(week=jstring("w"),add=jstring("a")
    ,del=jstring("d"),comm=jstring("c"))

# A tbl_json: 4 x 8 tibble with a "JSON" attribute
#   `attr(., "JSON")` document.id array.index     handle array.index.2       week   add   del  comm
#               <chr>       <int>       <int>      <chr>         <int>      <chr> <chr> <chr> <chr>
#1 {"w":1428192000...           1           1 ttuser1234             1 1428192000     0     0     0
#2 {"w":1428796800...           1           1 ttuser1234             2 1428796800     0     0     0
#3 {"w":1428192000...           1           2  coder1234             1 1428192000   212    79     5
#4 {"w":1428796800...           1           2  coder1234             2 1428796800   146    67     1

当然，您始终可以将数据拆分为两个单独的管道，但这对于本示例来说似乎是一个更好的解决方案。

score 0 · Accepted Answer

tidyjson很好，但我不确定在这种情况下是否有必要。这是实现我认为的预期结果的一种方法。

library(jsonlite)
library(dplyr)

df1 <- fromJSON(
  '
[
{
"total": 5,
"weeks": [
{
  "w": 1428192000,
  "a": 0,
  "d": 0,
  "c": 0
},
  {
  "w": 1428796800,
  "a": 0,
  "d": 0,
  "c": 0
  }
],
  "author": {
  "login": "ttuser1234",
  "id": 111111111
  }
  },
  {
  "total": 18,
  "weeks": [    
  {
  "w": 1428192000,
  "a": 212,
  "d": 79,
  "c": 5
  },
  {
  "w": 1428796800,
  "a": 146,
  "d": 67,
  "c": 1
  }
  ],
  "author": {
  "login": "coder1234",
  "id": 22222222
  }
  }
]
'
)

# now the weeks column will actually be nested data.frames
#  we can sort of join the weeks with the author information
#  like this

df_joined <- df1 %>%
  do(
    data.frame(
      .[["author"]],
      bind_rows(.[["weeks"]])
    )
  )

r - R：使用 dplyr 链接 2 个对象（Github stats API 示例）

2 回答 2

Related

Reference