I'm trying to do a Spark tutorial that comes with the Cloudera Virtual Machine. But even though I'm using the correct line-ending encoding, I can not execute the scripts, because I get tons of errors. The tutorial is part of the Coursera Introduction to Big Data Analytics course. The assignment can be found here.
So here's what I did. Install the IPython shell (if not yet done):
sudo easy_install ipython==1.2.1
Open/Start the shell (either with 1.2.0 or 1.4.0):
PYSPARK_DRIVER_PYTHON=ipython pyspark --packages com.databricks:spark-csv_2.10:1.2.0
Set the line-endings to windows style. This is because the file is in windows-encoding and it's said in the course to do so. If you don't do this, you'll get other errors.
Trying to load the CSV file:
yelp_df = sqlCtx.load(source='com.databricks.spark.csv',header = 'true',inferSchema = 'true',path = 'file:///usr/lib/hue/apps/search/examples/collections/solr_configs_yelp_demo/index_data.csv')
But getting a very long list of errors, which starts like this:
Py4JJavaError: An error occurred while calling o23.load.: java.lang.RuntimeException:
Unable to instantiate
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at
The full error message can be seen here. And this is the /etc/hive/conf/hive-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Hive Configuration can either be stored in this file or in the hadoop configuration files -->
<!-- that are implied by Hadoop setup variables. -->
<!-- Aside from Hadoop setup variables - this file is provided as a convenience so that Hive -->
<!-- users do not have to edit hadoop configuration files (that may be managed as a centralized -->
<!-- resource). -->
<!-- Hive Execution Parameters -->
<description>JDBC connect string for a JDBC metastore</description>
<description>Driver class name for a JDBC metastore</description>
<description>This is the WAR file with the jsp content for Hive Web Interface</description>
<description>IP address (or fully-qualified domain name) and port of the metastore host</description>
Any help or idea how to solve that? I guess it's a pretty common error. But I couldn't find any solution, yet.
One more thing: is there a way to dump such long error messages into a separate log-file?