1

我正在尝试将数据从 mysql 表摄取到 hdfs。但它给了我以下错误

IST ERROR [TaskExecutor-0] org.apache.gobblin.runtime.Task [demo_user_1582873318919_0] 504 - Processing record incurs an unexpected exception:

java.lang.RuntimeException: Unable to convert field:derivedwatermarkcolumn for value:"abc" for record: 
{"id":"1","name":"abc","password":"abc","derivedwatermarkcolumn":"abc"}
at org.apache.gobblin.converter.avro.JsonElementConversionFactory$RecordConverter.convertField(JsonElementConversionFactory.java:647)
    at org.apache.gobblin.converter.avro.JsonElementConversionFactory$JsonElementConverter.convert(JsonElementConversionFactory.java:280)
    at org.apache.gobblin.converter.avro.JsonIntermediateToAvroConverter.convertRecord(JsonIntermediateToAvroConverter.java:81)
    at org.apache.gobblin.converter.avro.JsonIntermediateToAvroConverter.convertRecord(JsonIntermediateToAvroConverter.java:50)
    at org.apache.gobblin.instrumented.converter.InstrumentedConverterDecorator.convertRecordImpl(InstrumentedConverterDecorator.java:74)
    at org.apache.gobblin.instrumented.converter.InstrumentedConverterBase.convertRecord(InstrumentedConverterBase.java:125)
    at org.apache.gobblin.instrumented.converter.InstrumentedConverterDecorator.convertRecord(InstrumentedConverterDecorator.java:68)
    at org.apache.gobblin.runtime.MultiConverter$MultiConverterIterator$ChainedConverterIterator.<init>(MultiConverter.java:174)
    at org.apache.gobblin.runtime.MultiConverter$MultiConverterIterator.<init>(MultiConverter.java:130)
    at org.apache.gobblin.runtime.MultiConverter$1.iterator(MultiConverter.java:95)
    at org.apache.gobblin.runtime.Task.runSynchronousModel(Task.java:499)
    at org.apache.gobblin.runtime.Task.run(Task.java:362)
    at org.apache.gobblin.runtime.TaskExecutor$TrackingTask.run(TaskExecutor.java:443)
    at org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Failed to parse the date
    at org.apache.gobblin.converter.avro.JsonElementConversionFactory$DateConverter.convertField(JsonElementConversionFactory.java:450)
    at org.apache.gobblin.converter.avro.JsonElementConversionFactory$JsonElementConverter.convert(JsonElementConversionFactory.java:280)
    at org.apache.gobblin.converter.avro.JsonElementConversionFactory$RecordConverter.convertField(JsonElementConversionFactory.java:639)
    ... 22 more
IST ERROR [TaskExecutor-0] org.apache.gobblin.runtime.Task [demo_user_1582893709536_0] 567 - Task task_GobblinMySql_1582893709536_0 failed
java.lang.RuntimeException: java.lang.RuntimeException: Failed to parse the date
    at org.apache.gobblin.runtime.Task.runSynchronousModel(Task.java:505)
    at org.apache.gobblin.runtime.Task.run(Task.java:362)
    at org.apache.gobblin.runtime.TaskExecutor$TrackingTask.run(TaskExecutor.java:443)
    at org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Failed to parse the date
    at org.apache.gobblin.converter.avro.JsonElementConversionFactory$DateConverter.convertField(JsonElementConversionFactory.java:450)
    at org.apache.gobblin.converter.avro.JsonElementConversionFactory$JsonElementConverter.convert(JsonElementConversionFactory.java:280)
    at org.apache.gobblin.converter.avro.JsonElementConversionFactory$RecordConverter.convertField(JsonElementConversionFactory.java:639)
    at org.apache.gobblin.converter.avro.JsonElementConversionFactory$JsonElementConverter.convert(JsonElementConversionFactory.java:280)
    at org.apache.gobblin.converter.avro.JsonIntermediateToAvroConverter.convertRecord(JsonIntermediateToAvroConverter.java:81)
    at org.apache.gobblin.converter.avro.JsonIntermediateToAvroConverter.convertRecord(JsonIntermediateToAvroConverter.java:50)
    at org.apache.gobblin.instrumented.converter.InstrumentedConverterDecorator.convertRecordImpl(InstrumentedConverterDecorator.java:74)
    at org.apache.gobblin.instrumented.converter.InstrumentedConverterBase.convertRecord(InstrumentedConverterBase.java:125)
    at org.apache.gobblin.instrumented.converter.InstrumentedConverterDecorator.convertRecord(InstrumentedConverterDecorator.java:68)
    at org.apache.gobblin.runtime.MultiConverter$MultiConverterIterator$ChainedConverterIterator.<init>(MultiConverter.java:174)
    at org.apache.gobblin.runtime.MultiConverter$MultiConverterIterator.<init>(MultiConverter.java:130)
    at org.apache.gobblin.runtime.MultiConverter$1.iterator(MultiConverter.java:95)
    at org.apache.gobblin.runtime.Task.runSynchronousModel(Task.java:499)
    ... 12 more


下面是记录模式

IST INFO  [JobScheduler-0] org.apache.gobblin.source.jdbc.JdbcExtractor [demo_user_1582893709536_0] 361 - Schema:[

{"columnName":"id","dataType":{"type":"int"},"isWaterMark":false,"primaryKey":1,"length":0,"precision":10,"scale":0,"isNullabl
e":false,"format":"","comment":"","isUnique":false},

{"columnName":"name","dataType":"type":"string"},"isWaterMark":false,"primaryKey":0,"length":0,"precision":0,"scale":0,"isNulla
ble":true,"format":"","comment":"","isUnique":false},

{"columnName":"password","dataType":{"type":"string"},"isWaterMark":false,"primaryKey":0,"length":0,"precision":0,"scale":0,"isNulla
ble":true,"format":"","comment":"","isUnique":false},

{"columnName":"derivedwatermarkcolumn","dataType":{"type":"timestamp"},"isWaterMark":true,"primaryKey":0,"length":0,"precision":0,"scale":0,"isNul
lable":false,"comment":"Default watermark column","isUnique":false}]

水印派生水印列的数据类型是时间戳,但在记录中它是字符串 'abc'

作业和属性文件如下。

mysql.pull

# Job properties
job.name=GobblinMySql
job.group=MySql
job.description=Data pull from MySql
job.lock.enabled=False


# Extract properties
extract.namespace=demo
extract.table.type=snapshot_only
extract.table.name=user
extract.delta.fields=name,password
extract.primary.key.fields=id

# Property to consider the extract as full dump
extract.is.full=true

# Source properties
source.querybased.schema=demo
source.entity=user
source.querybased.extract.type=snapshot

mysql.properties

# Source properties - source class to extract data from Mysql Source
source.class=org.apache.gobblin.source.extractor.extract.jdbc.MysqlSource

# Source properties
source.max.number.of.partitions=1
source.querybased.partition.interval=1
source.querybased.is.compression=false
source.querybased.watermark.type=timestamp

# Source connection properties
source.conn.driver=com.mysql.jdbc.Driver
source.conn.username=root
source.conn.password=root
source.conn.host=localhost
source.conn.port=3306
source.conn.timeout=1500

# Converter properties - Record from mysql source will be processed by the below series of converters
converter.classes=org.apache.gobblin.converter.avro.JsonIntermediateToAvroConverter

# date columns format
converter.avro.timestamp.format=YYYY-MM-DD HH:MM:SS
converter.avro.date.format=yyyy-MM-dd
converter.avro.time.format=HH:mm:ss

# Qualitychecker properties
qualitychecker.task.policies=org.apache.gobblin.policies.count.RowCountPolicy,org.apache.gobblin.policies.schema.SchemaCompatibilityPolicy
qualitychecker.task.policy.types=OPTIONAL,OPTIONAL

# Publisher properties
data.publisher.type=org.apache.gobblin.publisher.BaseDataPublisher

是什么导致配置文件中出现此错误?如果有人知道,请帮忙。

4

1 回答 1

1

看起来水印列的名称来自extract.delta.fields属性。在您的示例中,它设置为“名称,密码”,因此名称被视为水印。尝试将其设置为“派生水印列”。

我是如何发现这个的:我查看了 MysqlSource 类的代码以找到提到水印的位置,然后使用 IntelliJ 的检查器找出数据的来源。您可以通过上下文菜单 -> 分析 -> 分析数据流到此处来获取它。

于 2020-02-28T22:02:28.443 回答