我有一个处于独立模式的 Apache Spark Cluster(2.2.0)。到目前为止,一直在使用 HDFS 来存储 parquet 文件。我正在使用 Apache Hive 1.2 的 Hive Metastore 服务来访问,使用 Thriftserver,Spark over JDBC。
现在我想使用 S3 对象存储而不是 HDFS。我已将以下配置添加到我的 hive-site.xml:
<property>
<name>fs.s3a.access.key</name>
<value>access_key</value>
<description>Profitbricks Access Key</description>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>secret_key</value>
<description>Profitbricks Secret Key</description>
</property>
<property>
<name>fs.s3a.endpoint</name>
<value>s3-de-central.profitbricks.com</value>
<description>ProfitBricks S3 Object Storage Endpoint</description>
</property>
<property>
<name>fs.s3a.endpoint.http.port</name>
<value>80</value>
<description>ProfitBricks S3 Object Storage Endpoint HTTP Port</description>
</property>
<property>
<name>fs.s3a.endpoint.https.port</name>
<value>443</value>
<description>ProfitBricks S3 Object Storage Endpoint HTTPS Port</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>s3a://dev.spark.my_bucket/parquet/</value>
<description>Profitbricks S3 Object Storage Hive Warehouse Location</description>
</property>
我在 MySQL 5.7 数据库中有配置单元元存储。我已将以下 jar 文件添加到 Hive lib 文件夹中:
- aws-java-sdk-1.7.4.jar
- hadoop-aws-2.7.3.jar
我已删除 MySQL 上的旧配置单元元存储模式,然后使用以下命令启动元存储服务:hive --service metastore &
我收到以下错误:
java.lang.NoClassDefFoundError: com/fasterxml/jackson/databind/ObjectMapper
at com.amazonaws.util.json.Jackson.<clinit>(Jackson.java:27)
at com.amazonaws.internal.config.InternalConfig.loadfrom(InternalConfig.java:182)
at com.amazonaws.internal.config.InternalConfig.load(InternalConfig.java:199)
at com.amazonaws.internal.config.InternalConfig$Factory.<clinit>(InternalConfig.java:232)
at com.amazonaws.ServiceNameFactory.getServiceName(ServiceNameFactory.java:34)
at com.amazonaws.AmazonWebServiceClient.computeServiceName(AmazonWebServiceClient.java:703)
at com.amazonaws.AmazonWebServiceClient.getServiceNameIntern(AmazonWebServiceClient.java:676)
at com.amazonaws.AmazonWebServiceClient.computeSignerByURI(AmazonWebServiceClient.java:278)
at com.amazonaws.AmazonWebServiceClient.setEndpoint(AmazonWebServiceClient.java:160)
at com.amazonaws.services.s3.AmazonS3Client.setEndpoint(AmazonS3Client.java:475)
at com.amazonaws.services.s3.AmazonS3Client.init(AmazonS3Client.java:447)
at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:391)
at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:371)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:235)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2848)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2830)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:104)
at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:140)
at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:146)
at org.apache.hadoop.hive.metastore.Warehouse.getWhRoot(Warehouse.java:159)
at org.apache.hadoop.hive.metastore.Warehouse.getDefaultDatabasePath(Warehouse.java:177)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB_core(HiveMetaStore.java:601)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5757)
at org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:5990)
at org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:5915)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: java.lang.ClassNotFoundException: com.fasterxml.jackson.databind.ObjectMapper
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
缺少的类属于 Jackson 库,然后我复制了位于我的 spark-2.2.0-bin-hadoop2.7/jars/ 文件夹中的 Jackson-*.jar,它们是:
- jackson-annotations-2.6.5.jar
- 杰克逊核心2.6.5.jar
- 杰克逊核心asl-1.9.13.jar
- jackson-databind-2.6.5.jar
- 杰克逊-jaxrs-1.9.13.jar
- jackson-mapper-asl-1.9.13.jar
- 杰克逊模块参数-2.6.5.jar
- 杰克逊模块-scala_2.11-2.6.5.jar
- 杰克逊-xc-1.9.13.jar
但后来我收到以下错误:
2018-01-05 17:51:00,819 ERROR [main]: metastore.HiveMetaStore (HiveMetaStore.java:main(5920)) - Metastore Thrift Server threw an exception...
java.lang.NumberFormatException: For input string: "100M"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:589)
at java.lang.Long.parseLong(Long.java:631)
at org.apache.hadoop.conf.Configuration.getLong(Configuration.java:1319)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:248)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2848)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2830)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:104)
at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:140)
at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:146)
at org.apache.hadoop.hive.metastore.Warehouse.getWhRoot(Warehouse.java:159)
at org.apache.hadoop.hive.metastore.Warehouse.getDefaultDatabasePath(Warehouse.java:177)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB_core(HiveMetaStore.java:601)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5757)
at org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:5990)
at org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:5915)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
我认为这里的错误与某些 jar 版本不兼容有关,但我无法找到正确的版本。
有人可以在这里帮助我吗?