我们有源和目标系统。尝试使用 talend 工具将数据从 SQL Server 2012 导入 Pivotal Hadoop (PHD 3.0) 版本。
得到错误:
ERROR: extra data after last expected column (seg0 slice1 datanode.domain.com:40000 pid=15035)
Detail: External table pick_report_stg0, line 5472 of pxf://masternnode/path/to/hdfs?profile=HdfsTextSimple: "5472;2016-11-28 08:39:54.217;;2016-11-15 00:00:00.0;SAMPLES;0005525;MORGAN -EVENTS;254056;1;IHBL-NHO..."
我们尝试了
我们已将 BAD 行标识为 [hdfs@mdw ~]$ hdfs dfs -cat /path/to/hdfs|grep 3548
3548;2016-11-28 04:21:39.97;;2016-11-15 00:00:00.0;SAMPLES;0005525;MORGAN -EVENTS;254056;1;IHBL-NHO-13OZ-01;0;ROC NATION; NH;2016-11-15 00:00:00.0;2016-11-15 00:00:00.0;;2.0;11.99;SA;SC01;NH02;EA;1;F2;NEW PKG ONLY PLEASE!! BY NOON
外部表结构和格式子句
CREATE EXTERNAL TABLE schemaname.tablename
(
"ID" bigint,
"time" timestamp without time zone,
"ShipAddress4" character(40),
"EntrySystemDate" timestamp without time zone,
"CorpAcctName" character(40),
"Customer" character(7),
"CustomerName" character(30),
"SalesOrder" character(6),
"OrderStatus" character(1),
"MStockCode" character(30),
"ShipPostalCode" character(9),
"CustomerPoNumber" character(30),
"OrderDate" timestamp without time zone,
"ReqShipDate" timestamp without time zone,
"DateValue" timestamp without time zone,
"MOrderQty" numeric(9,0),
"MPrice" numeric(9,0),
"CustomerClass" character(2),
"ProductClass" character(4),
"ProductGroup" character(10),
"StockUom" character(3),
"DispatchCount" integer,
"MWarehouse" character(2),
"AlphaValue" varchar(100)
)
LOCATION (
'pxf://path/to/hdfs?profile=HdfsTextSimple'
)
FORMAT 'csv' (delimiter ';' null '' quote ';')
ENCODING 'UTF8';
发现:出现额外的分号,导致额外的数据。但我仍然无法提供正确的格式条款。请指导如何删除额外的数据列错误。
我应该使用什么格式子句。
对此的任何帮助将不胜感激!