1

我在从运行 spark 的 AWS EMR 集群连接到运行 presto 的另一个 AWS EMR 集群时遇到问题。

用python编写的代码是:

jdbcDF = spark.read \
        .format("jdbc") \
        .option("driver", "com.facebook.presto.jdbc.PrestoDriver")\
        .option("url", "jdbc:presto://ec2-xxxxxxxxxxxx.ap-southeast-2.compute.amazonaws.com:8889/hive/data-lake") \
        .option("user", "hadoop") \
        .option("dbtable", "customer") \
        .load()\

通过 aws 部署,emr add-steps带有选项--packages,\'org.apache.spark:spark-streaming-kinesis-asl_2.11:2.4.0,org.postgresql:postgresql:42.2.9,com.facebook.presto:presto-jdbc:0.60\',\

部署时会引发以下错误

org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:64) org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1862) 的线程“main”java.lang.reflect.UndeclaredThrowableException 中的异常) 在 org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:330) 在 org.apache.spark.executor 的 org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:237)。 CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) 原因:org.apache.spark.SparkException:在 awaitResult 中抛出异常:在 org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226) 在 org.apache.spark .rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) 在 org.apache.spark.rpc.RpcEnv。setupEndpointRefByURI(RpcEnv.scala:101) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:250) at org.apache.spark.deploy.SparkHadoopUtil$$anon $2.run(SparkHadoopUtil.scala:65) at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:64) at java.security.AccessController.doPrivileged(Native Method) at javax.security。 auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) ... 4 更多原因:java.io.IOException: 无法连接到 ip -xxxx-xxx.ap-southeast-2.compute.internal/xxx-xxxx:41885 在 org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245) 在 org.apache.spark.network。客户端.TransportClientFactory。createClient(TransportClientFactory.java:187) at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198) at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox. scala:194) at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util .concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 原因: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection denied: ip-xxxxxxxxx.ap-southeast-2.compute.internal/xxxxxx:41885 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch .SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) 在 io。netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323) 在 io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) 在 io.netty.channel.nio.NioEventLoop。 processSelectedKey(NioEventLoop.java:633) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) at io.netty .channel.nio.NioEventLoop.run(NioEventLoop.java:459) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run( DefaultThreadFactory.java:138) ... 1 更多原因:java.net.ConnectException:连接被拒绝 ... 11 更多 LogType 结束:stderrchannel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) at io.netty.channel.nio.NioEventLoop.processSelectedKey( NioEventLoop.java:633) 在 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) 在 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) 在 io.netty.channel .nio.NioEventLoop.run(NioEventLoop.java:459) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory. java:138) ... 1 更多原因:java.net.ConnectException:连接被拒绝 ... 11 更多 LogType 结束:stderrchannel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) at io.netty.channel.nio.NioEventLoop.processSelectedKey( NioEventLoop.java:633) 在 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) 在 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) 在 io.netty.channel .nio.NioEventLoop.run(NioEventLoop.java:459) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory. java:138) ... 1 更多原因:java.net.ConnectException:连接被拒绝 ... 11 更多 LogType 结束:stderrNioSocketChannel.doFinishConnect(NioSocketChannel.java:323) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)在 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) 在 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) 在 io.netty.channel.nio.NioEventLoop.run (NioEventLoop.java:459) 在 io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) 在 io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) .. . 1 更多原因:java.net.ConnectException:连接被拒绝 ... 11 更多 LogType 结束:stderrNioSocketChannel.doFinishConnect(NioSocketChannel.java:323) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)在 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) 在 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) 在 io.netty.channel.nio.NioEventLoop.run (NioEventLoop.java:459) 在 io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) 在 io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) .. . 1 更多原因:java.net.ConnectException:连接被拒绝 ... 11 更多 LogType 结束:stderrdoFinishConnect(NioSocketChannel.java:323) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633) at io .netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop .java:459) 在 io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) 在 io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) ... 1更多原因:java.net.ConnectException:连接被拒绝... 11 更多 LogType 结束:stderrdoFinishConnect(NioSocketChannel.java:323) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633) at io .netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop .java:459) 在 io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) 在 io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) ... 1更多原因:java.net.ConnectException:连接被拒绝... 11 更多 LogType 结束:stderrnio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) 在 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633) 在 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java: 580) 在 io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) 在 io.netty.util.concurrent.SingleThreadEventExecutor $5.run(SingleThreadEventExecutor.java:858) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) ... 1 更多原因:java.net.ConnectException: Connection denied ... 11 更多 LogType:stderr 结束nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) 在 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633) 在 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java: 580) 在 io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) 在 io.netty.util.concurrent.SingleThreadEventExecutor $5.run(SingleThreadEventExecutor.java:858) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) ... 1 更多原因:java.net.ConnectException: Connection denied ... 11 更多 LogType:stderr 结束NioEventLoop.processSelectedKey(NioEventLoop.java:633) 在 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) 在 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) 在 io .netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) 在 io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) 在 io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator。 run(DefaultThreadFactory.java:138) ... 1 更多原因:java.net.ConnectException:连接被拒绝 ... 11 更多 LogType 结束:stderrNioEventLoop.processSelectedKey(NioEventLoop.java:633) 在 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) 在 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) 在 io .netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) 在 io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) 在 io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator。 run(DefaultThreadFactory.java:138) ... 1 更多原因:java.net.ConnectException:连接被拒绝 ... 11 更多 LogType 结束:stderr497) 在 io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) 在 io.netty.util.concurrent 的 io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)。 DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) ... 1 更多原因:java.net.ConnectException:连接被拒绝 ... 11 更多 LogType 结束:stderr497) 在 io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) 在 io.netty.util.concurrent 的 io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)。 DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) ... 1 更多原因:java.net.ConnectException:连接被拒绝 ... 11 更多 LogType 结束:stderr

虽然我已经编辑了上面的 IP 地址(安全第一),但它与 spark 从属实例上的内部 IP 地址相同。它似乎正在连接到自身并出现连接问题。

我已经打开了 AWS EC2 安全组中的端口,允许从 spark 主/从访问 presto 实例。

如果有帮助,编写用于测试连接性的快速节点脚本可以工作

var client = new presto.Client({
  host: prestoEndpoint,
  user: 'hadoop',
  port: 8889,
});

client.execute({
  query: 'select * from customer',
  catalog: 'hive',
  schema: 'data-lake',
  source: 'nodejs-client',
  state: function(error, query_id, stats) {
     console.log({ message: 'status changed', id: query_id, stats: stats });
  },
  columns: function(error, data) {
     console.log({ resultColumns: data });
  },
  data: function(error, data, columns, stats) {
    console.log({data, columns});
  },
  success: function(error, stats) {
     console.log(error);
     console.log(JSON.stringify(stats, null,2));
  },
  error: function(error) {
    console.log(error);
  },
});

错误消息的关键部分似乎是

引起:io.netty.channel.AbstractChannel$AnnotatedConnectException:连接被拒绝:ip-xxxxxxxxx.ap-southeast-2.compute.internal/xxxxxx:41885

4

1 回答 1

0

问题是 prest-jdbc 驱动程序的版本号

我将它从更新com.facebook.presto:presto-jdbc:0.60到, com.facebook.presto:presto-jdbc:0.225 所以完整的包参数是

--packages,\'org.apache.spark:spark-streaming-kinesis-asl_2.11:2.4.0,org.postgresql:postgresql:42.2.9,com.facebook.presto:presto-jdbc:0.255\',\

感谢@Lamanus 发现了那个

于 2020-01-29T18:23:22.590 回答