我尝试使用 spark-ec2 使用 hadoop 2.x 版本启动 ec2 集群,所以我尝试了:
./spark-ec2 -k spark -i ~/.ssh/spark.pem -s 1 --hadoop-major-version=2 launch my-spark-cluster
然后我发现在 tachyon 设置过程中有错误:
Setting up tachyon
RSYNC'ing /root/tachyon to slaves...
ec2-52-1-147-16.compute-1.amazonaws.com
ec2-52-1-147-16.compute-1.amazonaws.com: Formatting Tachyon Worker @ ip-172-31-21-86.ec2.internal
ec2-52-1-147-16.compute-1.amazonaws.com: Removing local data under folder: /mnt/ramdisk/tachyonworker/
Formatting Tachyon Master @ ec2-52-1-14-186.compute-1.amazonaws.com
Formatting JOURNAL_FOLDER: /root/tachyon/libexec/../journal/
Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException: Server IPC version 7 cannot communicate with client version 4
at tachyon.util.CommonUtils.runtimeException(CommonUtils.java:246)
at tachyon.UnderFileSystemHdfs.<init>(UnderFileSystemHdfs.java:73)
at tachyon.UnderFileSystemHdfs.getClient(UnderFileSystemHdfs.java:53)
at tachyon.UnderFileSystem.get(UnderFileSystem.java:53)
at tachyon.Format.main(Format.java:54)
Caused by: org.apache.hadoop.ipc.RemoteException: Server IPC version 7 cannot communicate with client version 4
at org.apache.hadoop.ipc.Client.call(Client.java:1070)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
at tachyon.UnderFileSystemHdfs.<init>(UnderFileSystemHdfs.java:69)
... 3 more
我搜索了一些相关的问题,这似乎Server IPC version 7 cannot communicate with client version 4
意味着服务器正在使用 hadoop 2.x,而客户端正在使用 hadoop 1.x。但是,我使用 hadoop 2.4.0 构建了我的 spark,并且我还尝试了使用 hadoop 2.4.0 及更高版本的官方 spark 预构建版本,两者都导致相同的错误。
顺便说一句,通过设置创建的hadoop版本--hadoop-major-version=2
是Hadoop 2.0.0-cdh4.2.0
. 这是一个问题吗?但是我在这里尝试使用 2.4 或 2.4.0,它们都没有被识别为有效的 hadoop 版本