0

我们在之前的 HDP 环境中测试过的所有用例都可以工作,所以每当我尝试在 hive 中编写 csv 数据帧时,我们都想将其转移到 CDP,它给了我这个错误。我已经尝试了所有将 csv 存储在 HDFS 数据帧中的库。我打印了 DF 的模式及其正确的。

21/07/14 12:13:56 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 6.0 (TID 39, datanode2.baf.com, executor 1): com.univocity.parsers.common.TextParsingException: java.lang.ArrayIndexOutOfBoundsException - -1
Ensure your configuration is correct, with delimiters, quotes and escape sequences that match the input format you are trying to parse
Parser Configuration: CsvParserSettings:
    Auto configuration enabled=true
    Auto-closing enabled=true
    Autodetect column delimiter=false
    Autodetect quotes=false
    Column reordering enabled=true
    Delimiters for detection=null
    Empty value=
    Escape unquoted values=false
    Header extraction enabled=null
    Headers=null
    Ignore leading whitespaces=false
    Ignore leading whitespaces in quotes=false
    Ignore trailing whitespaces=true
    Ignore trailing whitespaces in quotes=false
    Input buffer size=128
    Input reading on separate thread=false
    Keep escape sequences=false
    Keep quotes=false
    Length of content displayed on error=-1
    Line separator detection enabled=false
    Maximum number of characters per column=5000000
    Maximum number of columns=20480
    Normalize escaped line separators=true
    Null value=
    Number of records to read=all
    Processor=none
    Restricting data in exceptions=false
    RowProcessor error handler=null
    Selected fields=none
    Skip bits as whitespace=true
    Skip empty lines=true
    Unescaped quote handling=STOP_AT_DELIMITERFormat configuration:
    CsvFormat:
        Comment character=\0
        Field delimiter=~
        Line separator (normalized)=\n
        Line separator sequence=\n
        Quote character="
        Quote escape character=\
        Quote escape escape character=null
Internal state when error was thrown: line=54, column=61, record=54, charIndex=133505, headers=[LEAD_CO_MNE, BRANCH_CO_MNE, MIS_DATE, @ID, CONTRACT_DATE, VALUE_DATE, START_DATE, DRAWDOWN_END_DATE, PAYMENT_START_DATE, MATURITY_DATE, ARR_AGE_STATUS, RENEWAL_DATE, COOLING_DATE, CANCEL_DATE, BASE_DATE, BILL_PAY_DATE, BILL_ID, ACTIVITY_REF, BILL_DATE, BILL_TYPE, PAY_METHOD, BILL_STATUS, SET_STATUS, AGING_STATUS, NXT_AGE_DATE, CHASER_DATE, ALL_AGE_STATUS, SUSPENDED, REPORT_END_DATE, PAYMENT_TYPE, NUM_PAYMENTS, PROPERTY, PAYMENT_DATE, ACT_PAY_DATE, FIN_PAY_DATE, REPAY_REFERENCE, RPY_BILL_ID, SUSP_STATUS, SUSP_DATE, LAST_RENEW_DATE, PAYMENT_END_DATE, BILLS_SETTLED_CNT, STATIC_UPDATE, RESERVED_5, RPY_REFERENCE, RESERVED_4, RESERVED_3, RPY_ACTUAL_DATE, ACTUAL_RENEW_DATE, RESERVED_6, RESERVED_7, RESERVED_8, RESERVED_9, RESERVED_10, RESERVED_11, RESERVED_12, RESERVED_13, RESERVED_14, RESERVED_15, RESERVED_16, RESERVED_17, RESERVED_18]
    at com.univocity.parsers.common.AbstractParser.handleException(AbstractParser.java:395)
    at com.univocity.parsers.common.AbstractParser.parseNext(AbstractParser.java:616)
    at org.apache.spark.sql.catalyst.csv.UnivocityParser$$anon$1.next(UnivocityParser.scala:331)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
    at scala.collection.TraversableOnce$FlattenOps$$anon$1.hasNext(TraversableOnce.scala:464)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:645)
    at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:227)
    at org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:116)
    at org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:109)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:858)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:858)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:123)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1289)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
    at com.univocity.parsers.common.input.AbstractCharInputReader.getString(AbstractCharInputReader.java:482)
    at com.univocity.parsers.csv.CsvParser.parseSingleDelimiterRecord(CsvParser.java:185)
    at com.univocity.parsers.csv.CsvParser.parseRecord(CsvParser.java:108)
    at com.univocity.parsers.common.AbstractParser.parseNext(AbstractParser.java:574)
    ... 24 more
4

1 回答 1

0

你能分享一下吗

1.您的 CDP 版本是什么。2.您的样本数据。3.你的蜂巢表ddl。

我猜你可能需要一个 HIVE 连接器罐子。

于 2021-07-18T14:00:40.263 回答