groovy - 修复 Groovy 中的 Json 格式问题：Apache NiFi ExecuteScript

Question

我正在使用Apache Nifi，我的一个流文件是一个稍微格式错误的 Json：

{
"field" : "value",
"field1" : "value1"

}0;0

我不想使用之前应用的转换，而是使用 Groovy 脚本作为ExecuteScript的一部分。这就是我目前所拥有的：

import org.apache.nifi.processor.io.StreamCallback
import java.nio.charset.StandardCharsets
import org.apache.commons.io.IOUtils
import java.nio.charset.*


def flowFile = session.get()
if (!flowFile) return

def slurper = new groovy.json.JsonSlurper()

flowFile = session.write(flowFile, { inputStream, outputStream ->
    def text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
    def resultingText = text.substring(0, text.indexOf('}'))
    def json = slurper.parseText(resultingText)

    outputStream.write(json.toString().getBytes(StandardCharsets.UTF_8))

} as StreamCallback)

session.transfer(flowFile, REL_SUCCESS)

但是，我返回以下错误：

ExecuteScript[id=69ae1948-f20b-446c-b33f-298c6faa7c98] Failed to process session due to org.apache.nifi.processor.exception.ProcessException: javax.script.ScriptException: javax.script.ScriptException: groovy.json.JsonException: expecting '}' or ',' but got current char [SPACE] with an int value of 32

The current character read is [SPACE] with an int value of 32
expecting '}' or ',' but got current char [SPACE] with an int value of 32
line number 5
index number 61

...^: org.apache.nifi.processor.exception.ProcessException: javax.script.ScriptException: javax.script.ScriptException: groovy.json.JsonException: expecting '}' or ',' but got current char [SPACE] with an int value of 32

The current character read is [SPACE] with an int value of 32
expecting '}' or ',' but got current char [SPACE] with an int value of 32
line number 5
index number 61

...^

我在做任何明显错误的事情吗？谢谢您的帮助。

score 1 · Accepted Answer

子字符串的结束索引不包含在内。因此，您需要：

def resultingText = text.substring(0, text.indexOf('}') + 1)

或者，您可以在 groovy 中使用范围（包括在内）

def resultingText = text[0..text.indexOf('}')]

那就是你的结果。没有必要将其解析为地图JsonSlurper（除非您只想验证它是否有效）......并且json.toString()不会返回您想要的内容，它将返回地图的字符串表示形式

如果您的任何输入 json 具有嵌套对象，这将中断:-(

def resultingText = text[0..text.lastIndexOf('}')]

可能会更好:-)

score 0 · Accepted Answer

我已经结束了（谢谢@tim_yates）：

import org.apache.commons.io.IOUtils
import org.apache.nifi.processor.io.StreamCallback

import java.nio.charset.StandardCharsets

def flowFile = session.get()
if (!flowFile) return

def slurper = new groovy.json.JsonSlurper()

flowFile = session.write(flowFile, { inputStream, outputStream ->
    def text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
    def resultingText = text[0..text.lastIndexOf('}')]
    def json = slurper.parseText(resultingText)

   outputStream.write(json.toString().getBytes(StandardCharsets.UTF_8))

} as StreamCallback)

session.transfer(flowFile, REL_SUCCESS)

发布的示例只是真实 Json 的一部分。不幸的是，Json 有许多怪癖，例如双引号具有价值。例如：

{
 "field" : ""value""
}0;0

上面的代码使用以下格式：

{
 "field" : "value"
}0;0

我想我需要格式化 Json，然后我可以删除多余的字符。有没有一种简单的方法来确保格式正确？

太感谢了

编辑：

我其实错了，运行脚本后的返回值为：

{field=value}

编辑。这现在有效：

import org.apache.commons.io.IOUtils
import org.apache.nifi.processor.io.StreamCallback

import java.nio.charset.StandardCharsets

def flowFile = session.get()
if (!flowFile) return

flowFile = session.write(flowFile, { inputStream, outputStream ->
    def text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
    def resultingText = text[0..text.lastIndexOf('}')].replaceAll('""', '"')

    outputStream.write(resultingText.toString().getBytes(StandardCharsets.UTF_8))

} as StreamCallback)

session.transfer(flowFile, REL_SUCCESS)

但是，数据中仍然存在许多怪癖，这些怪癖将在以后的数据流中成为负担。以下是完整 Json 的示例：

{"field1": "D",
 "field2": "12345",
 "field3": "myText",
 "field4": ,
 "field5": "B2",
 "field6": "B",
 "field7": 74664",
 "field8": 2,
 "field9": [something."2334", something."9973"],
 "field10": ,
 "field11": "9,
 "field12": "J"}

我已经设法删除了双 '"'，但我关心的是具有右侧或左侧 "（例如 field7 和 field11）的值、没有任何 "" 的值（例如 field8）、空值和field9，应该是["something.2334", "something.9973"]

我想知道如何确保 Json 的格式正确（例如，以便以后摄取到 Db 中）。

太感谢了

groovy - 修复 Groovy 中的 Json 格式问题：Apache NiFi ExecuteScript

2 回答 2

Related

Reference