2

我正在尝试将 R 中的 data.frame 写入 JSON 文件,但采用分层结构,其中包含子节点。我找到了示例和 JSONIO,但无法将其应用于我的案例。

这是 R 中的 data.frame

> DF
   Date_by_Month    CCG Year Month refYear      name OC_5a OC_5b OC_5c 
1     2010-01-01 MyTown 2010    01    2009 2009/2010     0    15    27 
2     2010-02-01 MyTown 2010    02    2009 2009/2010     1    14    22 
3     2010-03-01 MyTown 2010    03    2009 2009/2010     1     6    10 
4     2010-04-01 MyTown 2010    04    2010 2010/2011     0    10    10 
5     2010-05-01 MyTown 2010    05    2010 2010/2011     1    16     7 
6     2010-06-01 MyTown 2010    06    2010 2010/2011     0    13    25 

除了按月编写数据之外,我还想创建一个聚合子节点,即“每年”,它包含今年所有月份的总和(例如)。这就是我希望 JSON 文件的样子:

[
    {
     "ccg":"MyTown",
     "data":[
            {"period":"yearly",
             "scores":[
                {"name":"2009/2010","refYear":"2009","OC_5a":2, "OC_5b": 35, "OC_5c": 59},
                {"name":"2010/2011","refYear":"2010","OC_5a":1, "OC_5b": 39, "OC_5c": 42},
             ]
             },
            {"period":"monthly",
             "scores":[
                {"name":"2009/2010","refYear":"2009","month":"01","year":"2010","OC_5a":0, "OC_5b": 15, "OC_5c": 27},
                {"name":"2009/2010","refYear":"2009","month":"02","year":"2010","OC_5a":1, "OC_5b": 14, "OC_5c": 22},
                {"name":"2009/2010","refYear":"2009","month":"03","year":"2010","OC_5a":1, "OC_5b": 6, "OC_5c": 10},
                {"name":"2009/2010","refYear":"2009","month":"04","year":"2010","OC_5a":0, "OC_5b": 10, "OC_5c": 10},
                {"name":"2009/2010","refYear":"2009","month":"05","year":"2010","OC_5a":1, "OC_5b": 16, "OC_5c": 7},
                {"name":"2009/2010","refYear":"2009","month":"01","year":"2010","OC_5a":0, "OC_5b": 13, "OC_5c": 25}
                ]
             }
            ]
    },
]

非常感谢你的帮助!

4

2 回答 2

2

扩展我的评论:

jsonlite软件包具有很多功能,但是您所描述的内容并没有真正映射到数据框,因此我怀疑任何罐装例程都具有此功能。您最好的选择可能是将数据框转换为更通用的列表(仅供参考数据框在内部存储为列列表),其结构与 JSON 的结构完全匹配,然后只需使用转换器进行翻译

这通常很复杂,但在您的情况下应该相当简单。该列表的结构与 JSON 数据完全相同:

list(
  list(
    ccg = "Town1",
    data = list(
      list(
        period = "yearly",
        scores = yearly_data_frame_town1
      ),
      list(
        period = "monthly",
        scores = monthly_data_frame_town1
      )
    )
  ),
  list(
    ccg = "Town2",
    data = list(
      list(
        period = "yearly",
        scores = yearly_data_frame_town2
      ),
      list(
        period = "monthly",
        scores = monthly_data_frame_town2
      )
    )
  )
)

构建这个列表应该是一个简单的例子,在每一步循环unique(DF$CCG)和使用aggregate,以构建年度数据。

如果您需要性能,请查看data.tabledplyr包来一次性完成循环和聚合。前者灵活且高效,但有点深奥。后者具有相对简单的语法并且具有类似的性能,但它是专门围绕为数据帧构建管道而设计的,因此可能需要一些黑客攻击才能使其产生正确的输出格式。

于 2015-02-26T13:50:19.280 回答
2

看起来 ssdecontrol 已经涵盖了您...但这是我的解决方案。需要遍历独特的 CCG 和 Years 来创建整个数据集...

df <- read.table(textConnection("Date_by_Month    CCG Year Month refYear      name OC_5a OC_5b OC_5c 
2010-01-01 MyTown 2010    01    2009 2009/2010     0    15    27 
2010-02-01 MyTown 2010    02    2009 2009/2010     1    14    22 
2010-03-01 MyTown 2010    03    2009 2009/2010     1     6    10 
2010-04-01 MyTown 2010    04    2010 2010/2011     0    10    10 
2010-05-01 MyTown 2010    05    2010 2010/2011     1    16     7 
2010-06-01 MyTown 2010    06    2010 2010/2011     0    13    25"), stringsAsFactors=F, header=T)


library(RJSONIO)
to_list <- function(ccg, year){
  df_monthly <- subset(df, CCG==ccg & Year==year)
  df_yearly <- aggregate(df[,c("OC_5a", "OC_5b", "OC_5c")] ,df[,c("name", "refYear")], sum)
  l <- list("ccg"=ccg, 
            data=list(list("period" = "yearly",
                      "scores" = as.list(df_yearly)
                      ),
                      list("period" = "monthly",
                           "scores" = as.list(df[,c("name", "refYear", "OC_5a", "OC_5b", "OC_5c")])
                      )
            )
       )
  return(l)
}
toJSON(to_list("MyTown", "2010"), pretty=T)

这会返回:

{
    "ccg" : "MyTown",
    "data" : [
        {
            "period" : "yearly",
            "scores" : {
                "name" : [
                    "2009/2010",
                    "2010/2011"
                ],
                "refYear" : [
                    2009,
                    2010
                ],
                "OC_5a" : [
                    2,
                    1
                ],
                "OC_5b" : [
                    35,
                    39
                ],
                "OC_5c" : [
                    59,
                    42
                ]
            }
        },
        {
            "period" : "monthly",
            "scores" : {
                "name" : [
                    "2009/2010",
                    "2009/2010",
                    "2009/2010",
                    "2010/2011",
                    "2010/2011",
                    "2010/2011"
                ],
                "refYear" : [
                    2009,
                    2009,
                    2009,
                    2010,
                    2010,
                    2010
                ],
                "OC_5a" : [
                    0,
                    1,
                    1,
                    0,
                    1,
                    0
                ],
                "OC_5b" : [
                    15,
                    14,
                    6,
                    10,
                    16,
                    13
                ],
                "OC_5c" : [
                    27,
                    22,
                    10,
                    10,
                    7,
                    25
                ]
            }
        }
    ]
}
于 2015-02-26T14:10:45.653 回答