0

我在运行地精作业时遇到错误。我的 core-site.xml 看起来不错,并且具有所需的值。

核心站点.xml

<property>
  <name>fs.AbstractFileSystem.gs.impl</name>
  <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value>
  <description>The AbstractFileSystem for 'gs:' URIs.</description>
</property>

错误

org.apache.gobblin.runtime.ForkException: Fork branches [0] failed for task task_toGCPHIVE_1639057335724_14
<Fork 0>
java.lang.RuntimeException: Error creating writer
    at org.apache.gobblin.writer.PartitionedDataWriter$4.get(PartitionedDataWriter.java:214)
    at org.apache.gobblin.writer.PartitionedDataWriter$4.get(PartitionedDataWriter.java:207)
    at org.apache.gobblin.writer.CloseOnFlushWriterWrapper.<init>(CloseOnFlushWriterWrapper.java:73)
    at org.apache.gobblin.writer.PartitionedDataWriter.<init>(PartitionedDataWriter.java:206)
    at org.apache.gobblin.runtime.fork.Fork.buildWriter(Fork.java:562)
    at org.apache.gobblin.runtime.fork.Fork.buildWriterIfNotPresent(Fork.java:570)
    at org.apache.gobblin.runtime.fork.Fork.processRecord(Fork.java:516)
    at org.apache.gobblin.runtime.fork.AsynchronousFork.processRecord(AsynchronousFork.java:103)
    at org.apache.gobblin.runtime.fork.AsynchronousFork.processRecords(AsynchronousFork.java:86)
    at org.apache.gobblin.runtime.fork.Fork.run(Fork.java:250)
    at org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
    at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
    at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.fs.UnsupportedFileSystemException: fs.AbstractFileSystem.gs.impl=null: No AbstractFileSystem configured for scheme: gs
    at org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:160)

我能够在命令行中运行 GS 命令而没有任何问题。例如:hadoop fs -ls gs://<<bucketName>>产生所需的输出。

任何帮助,将不胜感激 。

4

1 回答 1

1

如果使用 Scala、pyspark 和/或 SPARK 涉及弄乱 core-site.xml,则有 2 种可能的解决方案。

第一个与如何在 pyspark 中修复“方案没有文件系统:gs”有关? 方案没有文件系统:gs

第二个:没有Scheme的文件系统:gs”在本地运行spark作业时 没有Scheme的文件系统

最后,这也可能是云存储连接器的问题,我建议查看下一个文档以确保您的设置已正确应用。 云存储连接器

于 2021-12-13T16:10:22.257 回答