5

I have an aws ec2 cluster setup by the spark-ec2 script.

I would like to configure Zeppelin so that I can write scala code locally on Zeppelin and run it on the cluster (via master). Furthermore I would like to be able to access my s3 buckets.

I followed this guide and this other one however I can not seem to run scala code from zeppelin to my cluster.

I installed Zeppelin locally with

mvn install -DskipTests -Dspark.version=1.4.1 -Dhadoop.version=2.7.1

My security groups were set to both AmazonEC2FullAccess and AmazonS3FullAccess.

I edited the spark interpreter properties on the Zeppelin Webapp to spark://.us-west-2.compute.amazonaws.com:7077 from local[*]

  1. When I test out

    sc
    

    in the interpreter, I recieve this error

    java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.thrift.transport.TSocket.open(TSocket.java:182) at 
    
  2. When I try to edit "conf/zeppelin-site.xml" to change my port to 8082, no difference.

NOTE: I eventually would also want to access my s3 buckets with something like:

sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "xxx")
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey","xxx")
val file = "s3n://<<bucket>>/<<file>>"
val data = sc.textFile(file)
data.first

if any benevolent users have any advice (that wasn't already posted on StackOverflow) please let me know!

4

1 回答 1

2

很可能您的 IP 地址被阻止连接到您的 spark 集群。您可以尝试启动指向该端点的 spark-shell(甚至只是远程登录)。要修复它,您可以登录您的 AWS 帐户并更改防火墙设置。它也有可能没有指向正确的主机(我假设您从中删除了特定的框,spark://.us-west-2.compute.amazonaws.com:7077但如果没有的话,.us-west-2 应该有一点)。您可以尝试 ssh'ing 到那台机器并运行 netstat --tcp -l -n 以查看它是否正在侦听(或者甚至只是 ps aux |grep java 以查看 Spark 是否正在运行)。

于 2015-09-14T07:59:35.603 回答