我的最终目标是将数据从 hdfs 插入到 elasticsearch 但我面临的问题是连接性
我可以使用以下 curl 命令连接到我的 elasticsearch 节点
curl -u username -X GET https://xx.xxx.xx.xxx:9200/_cat/indices?v' --insecure
但是当谈到与火花的连接时,我无法这样做。我插入数据的命令是
df.write.mode("append").format('org.elasticsearch.spark.sql').option("es.net.http.auth.user", "username").option("es.net.http.auth.pass", "password").option("es.index.auto.create","true").option('es.nodes', 'https://xx.xxx.xx.xxx').option('es.port','9200').save('my-index/my-doctype')
我得到的错误是
org.elastisearch.hadoop.EsHadoopIllegalArgumentException:Cannot detect ES version - typical this happens if then network/Elasticsearch cluster is not accessible or when targetting a Wan/Cloud instance without the proper setting 'es.nodes.wan.only'
....
....
Caused by: org.elasticseach.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proy settings)- all nodes failed; tried [[xx.xxx.xx.xxx:9200]]
....
...
在这里,pyspark 相当于 curl --insecure
谢谢