1

我有一个以换行符分隔的 JSON 文件。是否可以使用类似的工具生成模式jq?我过去取得了一些成功,jq但还没有做过这么复杂的事情。

这是我的目标架构格式:https ://cloud.google.com/bigquery/docs/nested-repeated#example_schema 。请注意,嵌套是用fields父级的键处理的,数组是用"mode": "repeated". (非常感谢任何有关某种模式的帮助,然后我可以按摩成这种格式)。

从上面的链接复制,我想从中生成:

{"id":"1","first_name":"John","last_name":"Doe","dob":"1968-01-22","addresses":[{"status":"current","address":"123 First Avenue","city":"Seattle","state":"WA","zip":"11111","numberOfYears":"1"},{"status":"previous","address":"456 Main Street","city":"Portland","state":"OR","zip":"22222","numberOfYears":"5"}]}

...至...

[
    {
        "name": "id",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "first_name",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "last_name",
        "type": "STRING",
        "mode": "NULLABLE"
    },
    {
        "name": "dob",
        "type": "DATE",
        "mode": "NULLABLE"
    },
    {
        "name": "addresses",
        "type": "RECORD",
        "mode": "REPEATED",
        "fields": [
            {
                "name": "status",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "address",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "city",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "state",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "zip",
                "type": "STRING",
                "mode": "NULLABLE"
            },
            {
                "name": "numberOfYears",
                "type": "STRING",
                "mode": "NULLABLE"
            }
        ]
    }

]

(参考BigQuery 自动检测不适用于不一致的 json?,表明我不能使用 BigQuery 自动检测,因为项目不一样。我相当有信心可以手动将模式合并在一起以创建超集)

4

2 回答 2

2

这是一个简单的递归函数,如果您决定自己动手,可能会有所帮助:

def schema:
  def isdate($v):   $v | test("[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]");
  def array($k;$v): {"name":$k,"type":"RECORD",mode:"REPEATED","fields":($v[0] | schema)};
  def date($k):     {"name":$k,"type":"DATE",  mode:"NULLABLE"};
  def string($k):   {"name":$k,"type":"STRING",mode:"NULLABLE"};
  def item($k;$v):
     $v | if   type == "array"                 then array($k;$v)
          elif type == "string" and isdate($v) then date($k)
          elif type == "string"                then string($k)
      else empty end;
  [ to_entries[] | item(.key;.value) ]
;
schema

在线尝试!

于 2020-03-30T22:59:22.757 回答
1

非常感谢对某种模式的任何帮助,然后我可以按摩成这种格式

http://gist.github.com/pkoppstein/a5abb4ebef3b0f72a6ed有一个用 jq 编写的模式推断模块,但推断的模式是“结构的”——它们反映了输入 JSON。对于您的示例,推断的架构如下所示。如您所见,将其转换为您想要的格式非常容易,除了需要额外的工作来推断mode值。

请注意,上述模块从任意大的 JSON 文档“样本”中推断出“通用模式”。也就是说,它是一个模式推理引擎,而不是简单的“模式生成器”。

上面的链接引用了一个名为 JESS 的伴随模式检查器,它也是用 jq 编写的。“JESS”中的“E”代表“扩展”,表示用于指定模式的 JESS 模式语言允许包含复杂的约束。

{
  "id": "string",
  "first_name": "string",
  "last_name": "string",
  "dob": "string",
  "addresses": [
    {
      "status": "string",
      "address": "string",
      "city": "string",
      "state": "string",
      "zip": "string",
      "numberOfYears": "string"
    }
  ]
}
于 2020-03-30T22:47:07.627 回答