0

我正在尝试使用 s3 存储桶将表从数据服务加载到雪花(需要批量加载信息)。

我无法将输出文件格式化为 s3 存储桶。我在换行(不换行)和日期(额外精度)方面遇到问题,如果任何文本都有逗号(实际上分隔符是逗号),我可能会遇到逗号问题。

我已经看到在具有嵌套模式的 s3 存储桶中将文件写为 json 的可能性。但如果我这样做,我不知道如何从 Snowflake 调用副本。

这个项目是一个迁移。我正在更改雪花的旧数据库。SAP DS中的工作已经创建,想法只是改变命运,而不是信息流。

如果有人能给我带来一些帮助,那就太棒了。谢谢

4

1 回答 1

0

您可以使用具有 VARIANT 类型的单列的表来加载 json 文件。

这是一个例子:

/* Create a JSON file format that strips the outer array. */

create or replace file format json_format
  type = 'JSON'
  strip_outer_array = true;

/* Create an internal stage that references the JSON file format. */

create or replace stage mystage
  file_format = json_format;

/* Stage the JSON file. */

put file:///tmp/sales.json @mystage auto_compress=true;

/* Create a target table for the JSON data. */

create or replace table house_sales (src variant);

/* Copy the JSON data into the target table. */

copy into house_sales
   from @mystage/sales.json.gz;

select * from house_sales;

+---------------------------+
| SRC                       |
|---------------------------|
| {                         |
|   "location": {           |
|     "city": "Lexington",  |
|     "zip": "40503"        |
|   },                      |
|   "price": "75836",       |
|   "sale_date": "4-25-16", |
|   "sq__ft": "1000",       |
|   "type": "Residential"   |
| }                         |
| {                         |
|   "location": {           |
|     "city": "Belmont",    |
|     "zip": "02478"        |
|   },                      |
|   "price": "92567",       |
|   "sale_date": "6-18-16", |
|   "sq__ft": "1103",       |
|   "type": "Residential"   |
| }                         |
| {                         |
|   "location": {           |
|     "city": "Winchester", |
|     "zip": "01890"        |
|   },                      |
|   "price": "89921",       |
|   "sale_date": "1-31-16", |
|   "sq__ft": "1122",       |
|   "type": "Condo"         |
| }                         |
+---------------------------+

有关更多信息,请查看此处

您也可以直接查询暂存的 JSON 文件,请参见以下示例:

create or replace file format my_json_format type = 'json';
select * from @~/example_2.json.gz 
(
  file_format => my_json_format
);

我得到:

{
          "quiz": {
                    "maths": {
                              "q1": {
                                        "answer": "12",
                                        "options": [
                                                  "10",
                                                  "11",
                                                  "12",
                                                  "13"
                                        ],
                                        "question": "5 + 7 = ?"
                              },
                              "q2": {
                                        "answer": "4",
                                        "options": [
                                                  "1",
                                                  "2",
                                                  "3",
                                                  "4"
                                        ],
                                        "question": "12 - 8 = ?"
                              }
                    },
                    "sport": {
                              "q1": {
                                        "answer": "Huston Rocket",
                                        "options": [
                                                  "New York Bulls",
                                                  "Los Angeles Kings",
                                                  "Golden State Warriros",
                                                  "Huston Rocket"
                                        ],
                                        "question": "Which one is correct team name in NBA?"
                              }
                    }
          }
}

我也可以这样做:

select parse_json($1):quiz.maths from @~/example_2.json.gz 
(
  file_format => my_json_format
);

我得到:

{
          "q1": {
                    "answer": "12",
                    "options": [
                              "10",
                              "11",
                              "12",
                              "13"
                    ],
                    "question": "5 + 7 = ?"
          },
          "q2": {
                    "answer": "4",
                    "options": [
                              "1",
                              "2",
                              "3",
                              "4"
                    ],
                    "question": "12 - 8 = ?"
          }
}
于 2022-03-02T08:46:02.517 回答