0

我们有一小群关键的 hadoop-hawq 系统。我们必须读取一张外部表。

即从 ext_table 中选择 *

但是当我在 Hawq 中发出关于以下错误的投诉时:

Error Hawq complaints for : missing data for column "SoldToAddr2" 

我们尝试了以下操作:

我们在 ext_table 定义的格式子句中尝试了不同的特殊字符:

for ex:
CREATE READABLE EXTERNAL TABLE ext_table
(
 "ID" INTEGER,
 time timestamp,
 "Customer" char(7),
 "Name" varchar,
 "ShortName" char(10),
 "ExemptFinChg" char(1),
 "MaintHistory" char(1),
 "CustomerType" char(1),
 "MasterAccount" char(7),
 "StoreNumber" char(7),
 "PrtMasterAdd" char(1),
 "CreditStatus" char(1),
 "CreditLimit" decimal(14),
 "InvoiceCount" decimal(7),
 "Salesperson" char(3),
 "Salesperson1" char(3),
 "Salesperson2" char(3),
 "Salesperson3" char(3),
 "PriceCode" char(2),
 "CustomerClass" char(2),
 "Branch" char(2),
 "TermsCode" char(2),
 "InvDiscCode" char(2),
 "BalanceType" char(1),
 "Area" char(2),
 "LineDiscCode" char(2),
 "TaxStatus" char(1),
 "TaxExemptNumber" char(30),
 "SpecialInstrs" char(30),
 "PriceCategoryTable" char(52),
 "DateLastSale" date,
 "DateLastPay" date,
 "OutstOrdVal" decimal(16),
 "NumOutstOrd" decimal(6),
 "Telephone" char(20),
 "Contact" varchar,
 "AddTelephone" char(20),
 "Fax" char(20),
 "Telex" char(10),
 "TelephoneExtn" char(5),
 "Currency" text,
 "UserField1" char(10),
 "UserField2" decimal(16),
 "GstExemptFlag" char(1),
 "GstExemptNum" char(15),
 "GstLevel" char(1),
 "DetailMoveReqd" char(1),
 "InterfaceFlag" char(1),
 "ContractPrcReqd" char(1),
 "BuyingGroup1" char(2),
 "BuyingGroup2" char(2),
 "BuyingGroup3" char(2),
 "BuyingGroup4" char(2),
 "BuyingGroup5" char(2),
 "StatementReqd" char(1),
 "BackOrdReqd" char(1),
 "ShippingInstrs" char(30),
 "StateCode" char(3),
 "DateCustAdded" date,
 "StockInterchange" char(1),
 "MaintLastPrcPaid" char(1),
 "IbtCustomer" char(1),
 "SoDefaultDoc" char(1),
 "CounterSlsOnly" char(1),
 "PaymentStatus" char(1),
 "Nationality" char(3),
 "HighestBalance" decimal(16),
 "CustomerOnHold" char(1),
 "InvCommentCode" char(3),
 "EdiSenderCode" char(40),
 "RelOrdOsValue" decimal(16),
 "EdiFlag" char(1),
 "SoDefaultType" char(1),
 "Email" char(50),
 "ApplyOrdDisc" char(1),
 "ApplyLineDisc" char(1),
 "FaxInvoices" char(1),
 "FaxStatements" char(1),
 "HighInvDays" decimal(5),
 "HighInv" char(6),
 "DocFax" char(20),
 "DocFaxContact" char(40),
 "SoldToAddr1" char(150),
 "SoldToAddr2" char(80),
 "SoldToAddr3" char(40),
 "SoldToAddr4" char(40),
 "SoldToAddr5" char(40),
 "SoldPostalCode" char(9),
 "ShipToAddr1" char(40),
 "ShipToAddr2" char(40),
 "ShipToAddr3" char(40),
 "ShipToAddr4" char(40),
 "ShipToAddr5" char(40),
 "ShipPostalCode" char(9),
 "State" char(2),
 "CountyZip" char(5),
 "City" char(3),
 "State1" char(2),
 "CountyZip1" char(5),
 "City1" char(3),
 "DefaultOrdType" char(2),
 "PoNumberMandatory" char(1),
 "CreditCheckFlag" char(1),
 "CompanyTaxNumber" char(15),
 "DeliveryTerms" char(3),
 "TransactionNature" decimal(5),
 "DeliveryTermsC" char(3),
 "TransactionNatureC" decimal(5),
 "RouteCode" char(10),
 "FaxQuotes" char(1),
 "RouteDistance" decimal(6),
 "TpmCustomerFlag" char(1),
 "SalesWarehouse" text,
 "TpmPricingFlag" char(1),
 "ArStatementNo" char(2),
 "TpmCreditCheck" char(1),
 "WholeOrderShipFlag" char(1),
 "MinimumOrderValue" decimal(12),
 "MinimumOrderChgCod" char(6),
 "UkVatFlag" char(1),
 "UkCurrency" char(3),
 "TimeStamp" bytea
 )
LOCATION (
    'pxf://hostname/path/to/hdfs?profile=HdfsTextSimple')
FORMAT 'CSV' (delimiter '^' null 'null' quote '"')
ENCODING 'UTF8';

错误详情:

即在第 20 行遇到的坏行

error detail :DETAIL:  External table tablename, line 20 of pxf://hostname/path/to/hdfs?profile=HdfsTextSimple: "23020^2015-12-02 11:14:26.52^0023482^Carlos iglesias               ^          ^N^Y^ ^       ^       ..."

解决关键 hadoop-hawq 系统中错误的更好方法是什么?

任何帮助将非常感激 ?

4

2 回答 2

2

由于您使用 CSV 格式并以 " 作为字符串终止符,因此您的数据中很可能有一个额外的 " 错误地终止了字段。您将需要删除多余的引号或使用“\”对其进行转义。

您可以定义外部表来记录错误,然后使用 gp_read_error_log() 来读取这些错误。

有关日志记录错误的更多详细信息,请参见管理员指南: http: //gpdb.docs.pivotal.io/4360/admin_guide/load/topics/g-define-an-external-table-with-single-row-error-isolation .html

于 2015-12-04T14:39:49.293 回答
0

尝试带有 1 个文本列的 pxf 外部表,然后使用正则表达式解析(调试)hawq 内部的外部数据。
错误表也可能有所帮助。CSV(“逗号”分隔向量)很棘手。在您的陈述中,您有 null 'null'。如果有空字段,例如 "",pxf 将抛出错误。
我的建议:使用“”作为空,然后发布处理“空”。前任。

CREATE READABLE EXTERNAL TABLE ext_table
(raw_text text)
LOCATION ('pxf://hostname/path/to/hdfs?profile=HdfsTextSimple')
FORMAT 'text' (delimiter '^' null '' quote '"')
LOGS ERRORS INTO pxf_errors SEGMENT REJECT LIMIT 1000 ROWS
ENCODING 'UTF8';
于 2015-12-03T15:50:48.050 回答