amazon-s3 - How to use Zeppelin to access aws spark-ec2 cluster and s3 buckets

Question

I have an aws ec2 cluster setup by the spark-ec2 script.

I would like to configure Zeppelin so that I can write scala code locally on Zeppelin and run it on the cluster (via master). Furthermore I would like to be able to access my s3 buckets.

I followed this guide and this other one however I can not seem to run scala code from zeppelin to my cluster.

I installed Zeppelin locally with

mvn install -DskipTests -Dspark.version=1.4.1 -Dhadoop.version=2.7.1

My security groups were set to both AmazonEC2FullAccess and AmazonS3FullAccess.

I edited the spark interpreter properties on the Zeppelin Webapp to spark://.us-west-2.compute.amazonaws.com:7077 from local[*]

When I test out

sc

in the interpreter, I recieve this error

java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.thrift.transport.TSocket.open(TSocket.java:182) at

When I try to edit "conf/zeppelin-site.xml" to change my port to 8082, no difference.

NOTE: I eventually would also want to access my s3 buckets with something like:

sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "xxx")
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey","xxx")
val file = "s3n://<<bucket>>/<<file>>"
val data = sc.textFile(file)
data.first

if any benevolent users have any advice (that wasn't already posted on StackOverflow) please let me know!

score 2 · Accepted Answer

很可能您的 IP 地址被阻止连接到您的 spark 集群。您可以尝试启动指向该端点的 spark-shell（甚至只是远程登录）。要修复它，您可以登录您的 AWS 帐户并更改防火墙设置。它也有可能没有指向正确的主机（我假设您从中删除了特定的框，spark://.us-west-2.compute.amazonaws.com:7077但如果没有的话，.us-west-2 应该有一点）。您可以尝试 ssh'ing 到那台机器并运行 netstat --tcp -l -n 以查看它是否正在侦听（或者甚至只是 ps aux |grep java 以查看 Spark 是否正在运行）。

amazon-s3 - How to use Zeppelin to access aws spark-ec2 cluster and s3 buckets

1 回答 1

Related

Reference