json - 通过 jolt 转换复制具有不同配置的文档

Question

我的目标是获取一个在子树中有一个数组的输入文档，并将整个文档复制到该文档的副本数组中，并在每个后续副本中设置该数组中的各个值。

举个例子：

起始文件：

{
  "config": {
    "activeConfig": {
      "sourceDatabase": "test",
      "targetSites": [
        {
          "siteName": "location1",
          "targetDatabase": "devl",
          "siteShortName": "123"
        },
        {
          "siteName": "location2",
          "targetDatabase": "123",
          "siteShortName": "123"
        }
      ]
    }
  },
  "secondData": {
    "queries": [
      {
        "Tablename": "abc",
        "Query": "123"
      }
    ]
  }
}

预期输出：

[ {
  "config" : {
    "activeConfig" : {
      "sourceDatabase" : "test",
      "targetSites" : [ {
        "siteName" : "location1",
        "targetDatabase" : "devl",
        "siteShortName" : "123"
      }, {
        "siteName" : "location2",
        "targetDatabase" : "123",
        "siteShortName" : "123"
      } ],
      "currentSite" : {
        "siteName" : "location1",
        "targetDatabase" : "devl",
        "siteShortName" : "123"
      }
    }
  },
  "secondData" : {
    "queries" : [ {
      "Tablename" : "abc",
      "Query" : "123"
    } ]
  }
},
 {
  "config" : {
    "activeConfig" : {
      "sourceDatabase" : "test",
      "targetSites" : [ {
        "siteName" : "location1",
        "targetDatabase" : "devl",
        "siteShortName" : "123"
      }, {
        "siteName" : "location2",
        "targetDatabase" : "123",
        "siteShortName" : "123"
      } ],
      "currentSite" : {
        "siteName" : "location2",
        "targetDatabase" : "123",
        "siteShortName" : "123"
      }
    }
  },
  "secondData" : {
    "queries" : [ {
      "Tablename" : "abc",
      "Query" : "123"
    } ]
  }
} ]

到目前为止，我拥有的 JOLT 规格如下：

[
  {
    "operation": "shift",
    "spec": {
      "config": {
        "activeConfig": {
          "targetSites": {
            "*": {
              "@4": "[]",
              "@": "[].config.activeConfig.currentSite"
            }
          }
        }
      }
    }
  }
]

这让我很接近，但并不完全在那里。

[ {
  "config" : {
    "activeConfig" : {
      "sourceDatabase" : "test",
      "targetSites" : [ {
        "siteName" : "location1",
        "targetDatabase" : "devl",
        "siteShortName" : "123"
      }, {
        "siteName" : "location2",
        "targetDatabase" : "123",
        "siteShortName" : "123"
      } ]
    }
  },
  "secondData" : {
    "queries" : [ {
      "Tablename" : "abc",
      "Query" : "123"
    } ]
  }
}, {
  "config" : {
    "activeConfig" : {
      "currentSite" : {
        "siteName" : "location1",
        "targetDatabase" : "devl",
        "siteShortName" : "123"
      }
    }
  }
}, {
  "config" : {
    "activeConfig" : {
      "sourceDatabase" : "test",
      "targetSites" : [ {
        "siteName" : "location1",
        "targetDatabase" : "devl",
        "siteShortName" : "123"
      }, {
        "siteName" : "location2",
        "targetDatabase" : "123",
        "siteShortName" : "123"
      } ]
    }
  },
  "secondData" : {
    "queries" : [ {
      "Tablename" : "abc",
      "Query" : "123"
    } ]
  }
}, {
  "config" : {
    "activeConfig" : {
      "currentSite" : {
        "siteName" : "location2",
        "targetDatabase" : "123",
        "siteShortName" : "123"
      }
    }
  }
} ]

该规范创建了我正在寻找的结构，但不合并它们。所以我的最终数组最终包含 4 个项目，原始文档的 2 个副本，以及配置数组中的两个项目。我的目标是将配置数组中的这两项合并到文档副本中，因此我有原始文档的两个副本，每个副本配置一个值。

我唯一接近的其他规格是

[
  {
    "operation": "shift",
    "spec": {
      "config": {
        "activeConfig": {
          "targetSites": {
            "*": {
              "@4": "[&]",
              "@": "[&].config.activeConfig.currentSite"
            }
          }
        }
      }
    }
  }
]

这会在最终数组中产生两个文档副本，但 currentSite 部分以每个副本中配置数组中的所有值结束，而不是每个副本 1 个

[ {
  "config" : {
    "activeConfig" : {
      "sourceDatabase" : "test",
      "targetSites" : [ {
        "siteName" : "location1",
        "targetDatabase" : "devl",
        "siteShortName" : "123"
      }, {
        "siteName" : "location2",
        "targetDatabase" : "123",
        "siteShortName" : "123"
      } ],
      "currentSite" : [ {
        "siteName" : "location1",
        "targetDatabase" : "devl",
        "siteShortName" : "123"
      }, {
        "siteName" : "location2",
        "targetDatabase" : "123",
        "siteShortName" : "123"
      } ]
    }
  },
  "secondData" : {
    "queries" : [ {
      "Tablename" : "abc",
      "Query" : "123"
    } ]
  }
}, {
  "config" : {
    "activeConfig" : {
      "sourceDatabase" : "test",
      "targetSites" : [ {
        "siteName" : "location1",
        "targetDatabase" : "devl",
        "siteShortName" : "123"
      }, {
        "siteName" : "location2",
        "targetDatabase" : "123",
        "siteShortName" : "123"
      } ],
      "currentSite" : [ {
        "siteName" : "location1",
        "targetDatabase" : "devl",
        "siteShortName" : "123"
      }, {
        "siteName" : "location2",
        "targetDatabase" : "123",
        "siteShortName" : "123"
      } ]
    }
  },
  "secondData" : {
    "queries" : [ {
      "Tablename" : "abc",
      "Query" : "123"
    } ]
  }
} ]

（至于 WHY，本文档的下一步将是在一个 NiFi 流程中将其拆分为两个流程文件，这将允许每个文件单独配置）

感谢您提供的任何意见或帮助。

更新：

发现了另一个我正在努力掌握的有趣行为。

当我使用以下规范时，我得到一个对我来说没有意义的输出。

规格：

[
  {
    "operation": "shift",
    "spec": {
      "config": {
        "activeConfig": {
          "targetSites": {
            "*": {
              "@4": "[&]",
              "@": "[&].config.activeConfig.currentSite&"
            }
          }
        }
      }
    }
  }
]

输出：

[ {
  "config" : {
    "activeConfig" : {
      "sourceDatabase" : "test",
      "targetSites" : [ {
        "siteName" : "location1",
        "targetDatabase" : "devl",
        "siteShortName" : "123"
      }, {
        "siteName" : "location2",
        "targetDatabase" : "123",
        "siteShortName" : "123"
      } ],
      "currentSite0" : {
        "siteName" : "location1",
        "targetDatabase" : "devl",
        "siteShortName" : "123"
      },
      "currentSite1" : {
        "siteName" : "location2",
        "targetDatabase" : "123",
        "siteShortName" : "123"
      }
    }
  },
  "secondData" : {
    "queries" : [ {
      "Tablename" : "abc",
      "Query" : "123"
    } ]
  }
}, {
  "config" : {
    "activeConfig" : {
      "sourceDatabase" : "test",
      "targetSites" : [ {
        "siteName" : "location1",
        "targetDatabase" : "devl",
        "siteShortName" : "123"
      }, {
        "siteName" : "location2",
        "targetDatabase" : "123",
        "siteShortName" : "123"
      } ],
      "currentSite0" : {
        "siteName" : "location1",
        "targetDatabase" : "devl",
        "siteShortName" : "123"
      },
      "currentSite1" : {
        "siteName" : "location2",
        "targetDatabase" : "123",
        "siteShortName" : "123"
      }
    }
  },
  "secondData" : {
    "queries" : [ {
      "Tablename" : "abc",
      "Query" : "123"
    } ]
  }
} ]

我尝试更改输出路径 "@": "[&].config.activeConfig.currentSite&" 以在两个地方使用 &。这与我上面的第二个示例类似，其中两个值都以两个副本结尾，但您可以看到，在这种情况下，一个以 currentSite0 结尾，一个以 currentSite1 结尾，在两个数组索引 0 和 1 中。这意味着 &在表达式“[&].config.activeConfig.currentSite&”中计算时，它的行为就像它同时具有值 0 和 1。我很明显错过了这种行为的一些细微差别。

score 1 · Accepted Answer

必须使用两班制。一般来说，当用数组做“东西”时，必须对你想要做的每个“事情”做一个移位操作。

在您的情况下，您 1) 想要将内容复制到输出数组中，以及 2) 复制特定的目标站点。

规格

[
  // Step 1: Make the copies of the input data, based on the number
  //  of items in the targetSites array.
  {
    "operation": "shift",
    "spec": {
      "config": {
        "activeConfig": {
          "targetSites": {
            "*": { // targetSites array index
              // go back up 4 levels and grab the whole tree "@4"
              // and write it to the output as a top level array
              // indexed by the "targetSites array index"
              "@4": "[&1]"
            }
          }
        }
      }
    }
  },
  {
    // Step 2 : Annoyingly copy everything across, but use the 
    //  value of the top level array index, to copy the "right" 
    //  data out of the targetSites array.
    "operation": "shift",
    "spec": {
      "*": { // top level array index
        "config": {
          "sourceDatabase": "[&2].config.sourceDatabase", // straight copy across
          "activeConfig": {
            "targetSites": {
              "@": "[&4].config.activeConfig.targetSites", // straight copy across
              //
              // Nifty but very rarely used feature.
              // Use "&3" to lookup the "current" value of the top level array index
              //  and then use that as an index into the targetSites array, and copy
              //  that across as "currentSite"
              "&3": "[&4].config.activeConfig.currentSite"
            }
          }
        },
        "secondData": "[&1].secondData" // straight copy across
      }
    }
  }
]

json - 通过 jolt 转换复制具有不同配置的文档

1 回答 1

Related

Reference