3

我正在尝试将 spark 数据框写入 Elasticsearch,如下所示:

df.write.format("es").save("db/test")

不幸的是,我收到以下错误:

Py4JJavaError: An error occurred while calling o50.save.:
org.apache.spark.SparkException: Job aborted due to stage failure:
Task 0 in stage 3.0 failed 1 times, most recent failure: Lost task 0.0
in stage 3.0 (TID 8, localhost, executor driver): 
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No data
nodes with HTTP-enabled available; node discovery is disabled and none
of nodes specified fit the criterion [XX.XXX.XX.X:XXXX]
at org.elasticsearch.hadoop.rest.InitializationUtils.filterNonDataNodesIfNeeded(InitializationUtils.java:152)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:549)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:94)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:94)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

我用:

spark = 2.1.0 scala = 2.11 elasticsearch = 2.4.5 Jupyter notebook

和以下命令开始:

sudo PYSPARK_DRIVER_PYTHON=jupyter-notebook $SPARK_HOME/bin/pyspark --packages org.elasticsearch:elasticsearch-spark-20_2.11:5.3.1 --conf spark.es.nodes="52.XXX.XX.XX" --conf spark.es.port="XXX" --conf spark.es.nodes.discovery=false --conf spark.es.net.http.auth.user="user" --conf spark.es.net.http.auth.pass="password"

同样在使用 spark.es.nodes.discovery=true 时,我收到一个错误:

Py4JJavaError: An error occurred while calling o50.save.: 
org.apache.spark.SparkException: Job aborted due to stage failure:
Task 3 in stage 3.0 failed 1 times, most recent failure: Lost task 3.0
in stage 3.0 (TID 11, localhost, executor driver):
org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection
error (check network and/or proxy settings)- all nodes failed; tried
[[127.0.0.1:9200]] 
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:150)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:461)
at org.elasticsearch.hadoop.rest.RestClient.executeNotFoundAllowed(RestClient.java:469)
at org.elasticsearch.hadoop.rest.RestClient.exists(RestClient.java:537)
at org.elasticsearch.hadoop.rest.RestClient.touch(RestClient.java:543)
at org.elasticsearch.hadoop.rest.RestRepository.touch(RestRepository.java:412)
at org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:580)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:568)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
 at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:94)
 at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:94)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
 at org.apache.spark.scheduler.Task.run(Task.scala:99)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)

有人可以帮忙吗?

4

0 回答 0