2

我在 S3 中有一个具有以下结构的 json 文件

   {
  status: "Success",
  created_at: "19 AUG 2019",
  employees:[
     {"name":"name1", "id":"1"},
     {"name":"name2", "id":"2"},
     {"name":"name3", "id":"3"}
  ],
  contacts: [] 
}

以下 SQL 函数可以很好地查找联系人数量

SELECT count(*) FROM S3Object[*].contacts[*]

但有时,json 文件本身并没有那个contacs键,例如,

       {
  status: "Success",
  created_at: "19 AUG 2019",
  employees:[
     {"name":"name1", "id":"1"},
     {"name":"name2", "id":"2"},
     {"name":"name3", "id":"3"}
  ] 
}

在这种情况下,上面的 sql 返回联系人计数为 1,但我希望它返回“零”。

如何动态重写 sql 来处理文件内容?

4

1 回答 1

1

尝试:

SELECT count(*) FROM S3Object[*].contacts[*] as item WHERE item IS NOT MISSING

解释

所有的事情都是指SELECT Command

考虑这个例子:

{
  {
    status: "Success",
    created_at: "19 AUG 2019",
    employees:[
      {"name":"name1", "id":"1"},
      {"name":"name2", "id":"2"},
      {"name":"name3", "id":"3"}
    ],
    contacts: [
      {"a": "123"},
      {"b": "456"}
    ]
  },
  {
    status: "Success",
    created_at: "19 AUG 2019",
    employees:[
      {"name":"name1", "id":"1"},
      {"name":"name2", "id":"2"},
      {"name":"name3", "id":"3"}
    ]
  }
}

如果你运行SELECT * FROM S3Object[*].contacts[*],结果是

{"a": "123"}
{"b": "456"}
{}

由于第二个对象没有contacts,所以参考SELECT 命令

Amazon S3 Select 发出 MISSING,然后在输出序列化期间将其更改为空记录并返回。

所以SELECT count(*) FROM S3Object[*].contacts[*]返回项目的计数,即3.

如果你运行SELECT * FROM S3Object[*].contacts[*] as item WHERE item IS NOT MISSING,结果是

{"a": "123"}
{"b": "456"}

所有MISSING 项目将被丢弃。所以SELECT count(*) FROM S3Object[*].contacts[*] as item WHERE item IS NOT MISSING将返回项目的计数,即2.

于 2019-08-28T16:17:01.300 回答