我正在尝试按照https://gobblin.readthedocs.io/en/latest/case-studies/Kafka-HDFS-Ingestion/将数据从 kafka 主题摄取到 hdfs
我正在遵循的步骤:
启动动物园管理员
$ zookeeper-server-start.bat C:\Users\name\kafka_2.11-1.1.0\config\zookeeper.properties
启动卡夫卡
$ kafka-server-start.bat C:\Users\name\kafka_2.11-1.1.0\config\server.properties
如果不存在,则创建 kafka 主题
$ kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
启动 hadoop
$ C:\Users\name\hadoop-3.1.3\sbin\start-all.cmd
在 GOBBLIN_JOB_CONFIG_DIR 中创建 kafka-hdfs.pull 如下
job.group=GobblinKafka
job.description=Gobblin quick start job for Kafka
job.lock.enabled=false
kafka.brokers=localhost:9092
source.class=org.apache.gobblin.source.extractor.extract.kafka.KafkaSimpleSource
extract.namespace=org.apache.gobblin.extract.kafka
writer.builder.class=org.apache.gobblin.writer.SimpleDataWriterBuilder
writer.file.path.type=tablename
writer.destination.type=HDFS
writer.output.format=txt
data.publisher.type=org.apache.gobblin.publisher.BaseDataPublisher
mr.job.max.mappers=1
metrics.reporting.file.enabled=true
metrics.log.dir=/gobblin-kafka/metrics
metrics.reporting.file.suffix=txt
bootstrap.with.offset=earliest
fs.uri=hdfs://localhost:9000
writer.fs.uri=hdfs://localhost:9000
state.store.fs.uri=hdfs://localhost:9000
mr.job.root.dir=/gobblin-kafka/working
state.store.dir=/gobblin-kafka/state-store
task.data.root.dir=/jobs/kafkaetl/gobblin/gobblin-kafka/task-data
data.publisher.final.dir=/gobblintest/job-output
设置 GOBBLIN_WORK_DIR
$ export GOBBLIN_WORK_DIR=/mnt/c/users/name/incubator-gobblin/GOBBLIN_WORK_DIR
设置 GOBBLIN_JOB_CONFIG_DIR
$ export GOBBLIN_JOB_CONFIG_DIR=/mnt/c/users/name/incubator-gobblin/GOBBLIN_JOB_CONFIG_DIR
独立启动
$ bin/gobblin.sh service standalone start
以下是在 logs/standalone.out 中发现的一些错误
[JobScheduler-0] org.apache.gobblin.scheduler.JobScheduler$NonScheduledJobRunner 637 - Failed to run job GobblinKafkaQuickStart
org.apache.gobblin.runtime.JobException: Failed to run job GobblinKafkaQuickStart
ERROR [ForkExecutor-0] org.apache.gobblin.runtime.fork.Fork 258 - Fork 0 of task task_GobblinKafkaQuickStart_1580883582897_0 failed to process data records. Set throwable in holder org.apache.gobblin.runtime.ForkThrowableHolder@721ea24d
java.lang.RuntimeException: Error creating writer
ERROR [TaskExecutor-0] org.apache.gobblin.runtime.Task 545 - Task task_GobblinKafkaQuickStart_1580883582897_0 failed
java.lang.RuntimeException: Some forks failed.
ERROR [Commit-thread-0] org.apache.gobblin.runtime.SafeDatasetCommit 196 - Failed to persist dataset state for dataset of job job_GobblinKafkaQuickStart_1580883582897
org.apache.hadoop.security.AccessControlException: Permission denied: user=name, access=WRITE, inode="/":name:supergroup:drwxrwxr-x
ERROR [JobScheduler-0] org.apache.gobblin.util.executors.IteratorExecutor 163 - Iterator executor failure.
java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: Permission denied: user=name, access=WRITE, inode="/":name:supergroup:drwxrwxr-x
ERROR [JobScheduler-0] org.apache.gobblin.runtime.AbstractJobLauncher 521 - Failed to launch and run job job_GobblinKafkaQuickStart_1580883582897: java.io.IOException: Failed to commit dataset state for some dataset(s) of job job_GobblinKafkaQuickStart_1580883582897
java.io.IOException: Failed to commit dataset state for some dataset(s) of job job_GobblinKafkaQuickStart_1580883582897
请告诉我该如何解决。