0

我正在尝试从具有超过一百万行的 CSV 提取(从 Oracle 数据库表生成)创建 Parquet 表。其中大约 25 行的 START_DATE 为空值,CTAS 未能解释""null. 任何建议将不胜感激。

CREATE TABLE dfs.tmp.FOO as
select cast(columns[0] as INT) as `PRODUCT_ID`,
cast(columns[1] as INT) as `LEG_ID`,
columns[2] as `LEG_TYPE`,
to_timestamp(columns[3], 'dd-MMM-yy HH.mm.ss.SSSSSS a') as `START_DATE`
from dfs.`c:\work\prod\data\foo.csv`;



Error: SYSTEM ERROR: IllegalArgumentException: Invalid format ""
4

2 回答 2

0

您还可以使用 NULLIF() 函数,如下所示

CREATE TABLE dfs.tmp.FOO as
select cast(columns[0] as INT) as `PRODUCT_ID`,
cast(columns[1] as INT) as `LEG_ID`,
columns[2] as `LEG_TYPE`,
to_timestamp(NULLIF(columns[3],''), 'dd-MMM-yy HH.mm.ss.SSSSSS a') as `START_DATE`
from dfs.`c:\work\prod\data\foo.csv`;

NULLIF 会将空字符串转换为 null 并且转换不会失败。

于 2016-10-29T10:36:51.023 回答
0

您始终可以包含一个CASE语句来过滤掉空条目:

CREATE TABLE dfs.tmp.FOO as
select cast(columns[0] as INT) as `PRODUCT_ID`,
cast(columns[1] as INT) as `LEG_ID`,
columns[2] as `LEG_TYPE`,
CASE WHEN columns[3] = '' THEN null
  ELSE to_timestamp(columns[3], 'dd-MMM-yy HH.mm.ss.SSSSSS a') 
END as `START_DATE`
from dfs.`c:\work\prod\data\foo.csv`;
于 2015-11-25T18:48:55.603 回答