1

我正在努力解析 R 中的 JSON,其中包含字符串内和键/值对(和整个对象)之间的换行符。

这是我的意思的那种格式:

{
    "id": 123456,
    "name": "Try to parse this",
    "description": "Thought reading a JSON was easy? \r\n Try parsing a newline within a string."
}
{
    "id": 987654,
    "name": "Have another go",
    "description": "Another two line description... \r\n With 2 lines."
}

假设我将此 JSON 保存为example.json. 我尝试了各种技术来克服解析问题,在 SO 的其他地方建议。以下均无效:

library(jsonlite)

foo <- readLines("example.json")
foo <- paste(readLines("example.json"), collapse = "")

bar <- fromJSON(foo)
bar <- jsonlite::stream_in(textConnection(foo))
bar <- purrr::map(foo, jsonlite::fromJSON)
bar <- ndjson::stream_in(textConnection(foo))
bar <- read_json(textConnection(foo), format = "jsonl")

我认为这确实是 NDJSON 格式,但没有一个专门的包可以处理它。有些人建议使用 jsonlite 或 ndjson或这个这个)流式传输数据。其他人建议跨行映射函数或类似地在基础 R 中)。

一切都会引发以下错误之一: Error: parse error: trailing garbageError: parse error: premature EOF打开文本连接时出现问题。

有没有人有办法解决吗?

4

1 回答 1

1

编辑

知道 json 格式错误,我们会损失一些 ndjson 效率,但我认为我们可以实时修复它,假设我们清楚地有一个右大括号 ( }) 后面没有任何内容或一些空格(包括换行符),后面跟着一个 open-大括号 ( {)

fn <- "~/StackOverflow/TomWagstaff.json"
wrongjson <- paste(readLines(fn), collapse = "")
if (grepl("\\}\\s*\\{", wrongjson))
  wrongjson <- paste0("[", gsub("\\}\\s*\\{", "},{", wrongjson), "]")
json <- jsonlite::fromJSON(wrongjson, simplifyDataFrame = FALSE)
str(json)
# List of 2
#  $ :List of 3
#   ..$ id         : int 123456
#   ..$ name       : chr "Try to parse this"
#   ..$ description: chr "Thought reading a JSON was easy? \r\n Try parsing a newline within a string."
#  $ :List of 3
#   ..$ id         : int 987654
#   ..$ name       : chr "Have another go"
#   ..$ description: chr "Another two line description... \r\n With 2 lines."

从这里,您可以继续

txtjson <- paste(sapply(json, jsonlite::toJSON, pretty = TRUE), collapse = "\n")

(以下是原始答案,希望/假设该格式在某种程度上是合法的。)


假设您的数据实际上是这样的:

{"id":123456,"name":"Try to parse this","description":"Thought reading a JSON was easy? \r\n Try parsing a newline within a string."}
{"id": 987654,"name":"Have another go","description":"Another two line description... \r\n With 2 lines."}

那么它就是你怀疑的ndjson。从那里你可以这样做:

fn <- "~/StackOverflow/TomWagstaff.json"
json <- jsonlite::stream_in(file(fn), simplifyDataFrame = FALSE)
# opening file input connection.
#  Imported 2 records. Simplifying...
# closing file input connection.
str(json)
# List of 2
#  $ :List of 3
#   ..$ id         : int 123456
#   ..$ name       : chr "Try to parse this"
#   ..$ description: chr "Thought reading a JSON was easy? \r\n Try parsing a newline within a string."
#  $ :List of 3
#   ..$ id         : int 987654
#   ..$ name       : chr "Have another go"
#   ..$ description: chr "Another two line description... \r\n With 2 lines."

请注意,我没有简化为框架。要在控制台上获取文字块,请执行

cat(sapply(json, jsonlite::toJSON, pretty = TRUE), sep = "\n")
# {
#   "id": [123456],
#   "name": ["Try to parse this"],
#   "description": ["Thought reading a JSON was easy? \r\n Try parsing a newline within a string."]
# }
# {
#   "id": [987654],
#   "name": ["Have another go"],
#   "description": ["Another two line description... \r\n With 2 lines."]
# }

如果您想以这种方式将其转储到文件中(尽管没有任何内容jsonlite或类似内容能够读取它,因为它不再是合法的 ndjson 也不再是合法的 json 作为整个文件),那么您可以

txtjson <- paste(sapply(json, jsonlite::toJSON, pretty = TRUE), collapse = "\n")

然后将其保存为writeLines或类似。

于 2019-07-10T17:29:17.273 回答