0

我需要从数千个 URL 导入数据,以下是数据示例:

[{"date":"20201006T120000Z","uri":"secret","val":"1765.756"},{"date":"20201006T120500Z","uri":"secret","val":"2015.09258 "},{"date":"20201006T121000Z","uri":"secret","val":"2283.0885"}]

由于 COPY 不支持 JSON 格式,我一直使用它从一些 URL 导入数据:

CREATE TEMP TABLE stage(x jsonb);

COPY stage FROM PROGRAM 'curl https://.....';

insert into test_table select f.* from stage,
   jsonb_populate_recordset(null::test_table, x) f;

但它效率低下,因为它为每个导入创建一个表,并且一次导入一个 url。我想知道是否可以(通过工具、脚本或命令)读取包含所有 URL 的文件并将其数据复制到数据库中。

4

1 回答 1

0

With your example data, all you would have to do is remove the first character of the first line, and the last printable character (either , or ]) of every line, and then it would be compatible with COPY. It is possible for there to be JSON which would break that (either due to formatting or due to content), but then they would also break your example alternative code as well. If your example code does work, then perhaps you will never have such problematic data/formatting, or perhaps you just haven't run into it yet.

You could either add a processing step to remove those nuisance characters, or you could change the way you fetch the data in bulk (which you didn't describe) to avoid outputting them in the first place.

于 2021-04-05T02:09:05.547 回答