0

我已经研究了几个星期(在后台),并且对如何使用 NiFi JoltTransformJson 处理器将近似 CSV 的 JSON 数据转换为标记集感到困惑。我的意思是使用输入中数组第一行的数据作为输出中的 JSON 对象名称。

作为一个例子,我有这个输入数据:

[
  [
    "Company",
    "Retail Cost",
    "Percentage"
  ],
  [
    "ABC",
    "5,368.11",
    "17.09%"
  ],
  [
    "DEF",
    "101.47",
    "0.32%"
  ],
  [
    "GHI",
    "83.79",
    "0.27%"
  ]
]

我想要得到的输出是:

[
  {
    "Company": "ABC",
    "Retail Cost": "5,368.11",
    "Percentage": "17.09%"
  },
  {
    "Company": "DEF",
    "Retail Cost": "101.47",
    "Percentage": "0.32%"
  },
  {
    "Company": "GHI",
    "Retail Cost": "83.79",
    "Percentage": "0.27%"
  }
]

我认为这主要是两个问题:访问第一个数组的内容,然后确保输出数据不包含第一个数组。

我很想发布一个 Jolt 规范,显示自己已经有点接近了,但最接近的给了我正确的输出形状,但没有正确的内容。它看起来像这样:

[
  {
    "operation": "shift",
    "spec": {
      "*": {
        "*": "[&1].&0"
      }
    }
  }
]

但它会产生这样的输出:

[ {
  "0" : "Company",
  "1" : "Retail Cost",
  "2" : "Percentage"
}, {
  "0" : "ABC",
  "1" : "5,368.11",
  "2" : "17.09%"
}, {
  "0" : "DEF",
  "1" : "101.47",
  "2" : "0.32%"
}, {
  "0" : "GHI",
  "1" : "83.79",
  "2" : "0.27%"
} ]

这显然具有错误的对象名称,并且输出中的元素过多。

4

1 回答 1

4

可以做到,但是很难阅读/看起来很糟糕的正则表达式

规格

[
  {
    // this does most of the work, but producs an output
    //  array with a null in the Zeroth space.
    "operation": "shift",
    "spec": {
      // match the first item in the outer array and do 
      //  nothing with it, because it is just "header" data
      //   e.g. "Company", "Retail Cost", "Percentage".
      // we need to reference it, but not pass it thru
      "0": null,
      // 
      // loop over all the rest of the items in the outer array
      "*": {
        // this is rather confusing
        // "*" means match the array indices of the innner array
        // and we will write the value at that index "ABC" etc
        // to "[&1].@(2,[0].[&])"
        // "[&1]" means make the ouput be an array, and at index
        //   &1, which is the index of the outer array we are
        //   currently in.
        // Then "lookup the key" (Company, Retail Cost) using
        //  @(2,[0].[&])
        // Which is go back up the tree to the root, then 
        //  come back down into the first item of the outer array
        //  and Index it by the by the array index of the current
        //  inner array that we are at.
        "*": "[&1].@(2,[0].[&])"
      }
    }
  },
  {
    // We know the first item in the array will be null / junk,
    //  because the first item in the input array was "header" info.
    // So we match the first item, and then accumulate everything
    //  into a new array
    "operation": "shift",
    "spec": {
      "0": null,
      "*": "[]"
    }
  }
]
于 2017-09-30T17:04:19.017 回答