3

是否有一种干净的方法可以将 data.frames 上的默认“/json”后缀选项更改为基于列而不是基于行?

如果我理解正确,R 中的 Data.frames 实际上只是命名列表,其中每个列表的长度与其他列表相同。使用jsonlite,很容易显示差异(简单的例子,是的):

library(jsonlite)
ll <- list(xx=1:3, yy=6:8)
dd <- data.frame(xx=1:3, yy=6:8)
toJSON(dd)
# [1] "[ { \"xx\" : 1, \"yy\" : 6 }, { \"xx\" : 2, \"yy\" : 7 }, { \"xx\" : 3, \"yy\" : 8 } ]"
toJSON(ll)
# [1] "{ \"xx\" : [ 1, 2, 3 ], \"yy\" : [ 6, 7, 8 ] }"
toJSON(dd, dataframe='column')
# [1] "{ \"xx\" : [ 1, 2, 3 ], \"yy\" : [ 6, 7, 8 ] }"
toJSON(as.list(dd))
# [1] "{ \"xx\" : [ 1, 2, 3 ], \"yy\" : [ 6, 7, 8 ] }"

最后三个是相同的。通过使用dataframeto 的参数toJSON或将其强制data.frame转换为list.

使用 OpenCPU 的 API,调用看起来类似:

$ curl http://localhost:7177/ocpu/library/base/R/list/json -H "Content-Type: application/json" -d '{ "xx":[1,2,3], "yy":[6,7,8] }'
{
        "xx" : [
                1,
                2,
                3
        ],
        "yy" : [
                6,
                7,
                8
        ]
}

$ curl http://localhost:7177/ocpu/library/base/R/data.frame/json -H "Content-Type: application/json" -d '{ "xx":[1,2,3], "yy":[6,7,8] }'
[
        {
                "xx" : 1,
                "yy" : 6
        },
        {
                "xx" : 2,
                "yy" : 7
        },
        {
                "xx" : 3,
                "yy" : 8
        }
]

如果我希望它data.frame本身是基于 JSON 的列,那么我需要将其强制为list

$ curl http://localhost:7177/ocpu/library/base/R/data.frame -H "Content-Type: application/json" -d '{ "xx":[1,2,3], "yy":[6,7,8] }'
/ocpu/tmp/x000a0fb8/R/.val
/ocpu/tmp/x000a0fb8/stdout
/ocpu/tmp/x000a0fb8/source
/ocpu/tmp/x000a0fb8/console
/ocpu/tmp/x000a0fb8/info

$ curl http://localhost:7177/ocpu/library/base/R/as.list/json -d "x=x000a0fb8"
{
        "xx" : [
                1,
                2,
                3
        ],
        "yy" : [
                6,
                7,
                8
        ]
}

三个问题:

  1. 有没有办法将 OpenCPU 自动 JSON 化的默认行为更改为基于列的?

  2. 是否有原因(除了“必须默认某些东西”)它默认为基于行的?(这样我可以更好地理解基础和效率,而不是挑战。)

  3. 不过,这都是学术性的,因为大多数(如果不是全部)接受 JSON 输出的库都会透明地理解和转换格式。正确的?

(Win7 x64,R 3.0.3,opencpu 1.2.3,jsonlite 0.9.4)

(PS:谢谢Jeroen,OpenCPU真棒!玩的越多越喜欢。)

4

2 回答 2

3

For dataframe objects you can use HTTP GET and set the dataframe argument:

GET http://localhost:7177/ocpu/tmp/x000a0fb8/json?dataframe=rows

For example the Boston object from the MASS package is a dataframe as well:

https://cran.ocpu.io/MASS/data/Boston/json?dataframe=columns
https://cran.ocpu.io/MASS/data/Boston/json?dataframe=rows

For HTTP GET requests to a .../json endpoint, all the http parameters are mapped to arguments in the toJSON function from the jsonlite package. You can can also specify other toJSON arguments:

https://cran.ocpu.io/MASS/data/Boston/json?dataframe=columns&digits=4

To see which arguments are available, have a look at the jsonlite manual or this post.

Note that this only works if you do the 2 step procedure: first a HTTP POST on a function that returns a dataframe, followed by retrieving that object in json format with a HTTP GET request. You can not specify toJSON parameters when you do the 1-step shortcut where you fix the POST request with /json, because in POST requests the HTTP parameters always get mapped to the function call.

The reason for this default is that the row based design seems to be the most conventional and interoperable way of encoding tabular data. The jsonlite paper/vignette goes into some more detail. Note that it also works the other way around: you don't have to call the data.frame function to create a dataframe, just posting an argument in the form:

[{"xx":1,"yy":6},{"xx":2,"yy":7},{"xx":3,"yy":8}]

will automatically turn it into a data frame:

curl https://public.opencpu.org/ocpu/library/base/R/summary/console -d object='[{"xx":1,"yy":6},{"xx":2,"yy":7},{"xx":3,"yy":8}]'
于 2014-03-28T20:45:07.333 回答
0

如果您想避免 GET 请求,您可以在 Javascript 中进行转换:

var df = [
        {"id":1,"Sepal.Length":5.1,"Sepal.Width":3.5,"Petal.Length":1.4,"Petal.Width":0.2,"Species":"setosa"},
        {"id":2,"Sepal.Length":4.9,"Sepal.Width":3,"Petal.Length":1.4,"Petal.Width":0.2,"Species":"setosa"},
        {"id":3,"Sepal.Length":4.7,"Sepal.Width":3.2,"Petal.Length":1.3,"Petal.Width":0.2,"Species":"setosa"}
        ]

var columns = Object.keys(df[0]);
var dfcolumns = {};
for (i = 0; i < columns.length; i++) {
    var column = [];
    var colname = columns[i];
    for (j = 0; j < df.length; j++) {
        column.push(df[j][colname]);
    }
    dfcolumns[colname] = column;
}

console.log(dfcolumns);

结果:

{ id: [ 1, 2, 3 ],
  'Sepal.Length': [ 5.1, 4.9, 4.7 ],
  'Sepal.Width': [ 3.5, 3, 3.2 ],
  'Petal.Length': [ 1.4, 1.4, 1.3 ],
  'Petal.Width': [ 0.2, 0.2, 0.2 ],
  Species: [ 'setosa', 'setosa', 'setosa' ] }
于 2016-08-28T10:16:44.790 回答