5

I'm setting up GeoSpark Python and after installing all the pre-requisites, I'm running the very basic code examples to test it.

from pyspark.sql import SparkSession
from geo_pyspark.register import GeoSparkRegistrator


spark = SparkSession.builder.\
        getOrCreate()

GeoSparkRegistrator.registerAll(spark)

df = spark.sql("""SELECT st_GeomFromWKT('POINT(6.0 52.0)') as geom""")

df.show()

I tried running it with python3 basic.py and spark-submit basic.py, both give me this error:

Traceback (most recent call last):
  File "/home/jessica/Downloads/geo_pyspark/basic.py", line 8, in <module>
    GeoSparkRegistrator.registerAll(spark)
  File "/home/jessica/Downloads/geo_pyspark/geo_pyspark/register/geo_registrator.py", line 22, in registerAll
    cls.register(spark)
  File "/home/jessica/Downloads/geo_pyspark/geo_pyspark/register/geo_registrator.py", line 27, in register
    spark._jvm. \
TypeError: 'JavaPackage' object is not callable

I'm using Java 8, Python 3, Apache Spark 2.4, my JAVA_HOME is set correctly, I'm running Linux Mint 19. My SPARK_HOME is also set:

$ printenv SPARK_HOME
/home/jessica/spark/

How can I fix this?

4

2 回答 2

5

The Jars for geoSpark are not correctly registered with your Spark Session. There's a few ways around this ranging from a tad inconvenient to pretty seamless. For example, if when you call spark-submit you specify:

--jars jar1.jar,jar2.jar,jar3.jar

then the problem will go away, you can also provide a similar command to pyspark if that's your poison.

If, like me, you don't really want to be doing this every time you boot (and setting this as a .conf() in Jupyter will get tiresome) then instead you can go into $SPARK_HOME/conf/spark-defaults.conf and set:

spark-jars jar1.jar,jar2.jar,jar3.jar

Which will then be loaded when you create a spark instance. If you've not used the conf file before it'll be there as spark-defaults.conf.template.

Of course, when I say jar1.jar.... What I really mean is something along the lines of:

/jars/geo_wrapper_2.11-0.3.0.jar,/jars/geospark-1.2.0.jar,/jars/geospark-sql_2.3-1.2.0.jar,/jars/geospark-viz_2.3-1.2.0.jar

but that's up to you to get the right ones from the geo_pyspark package.

If you are using an EMR: You need to set your cluster config json to

[
  {
    "classification":"spark-defaults", 
    "properties":{
      "spark.jars": "/jars/geo_wrapper_2.11-0.3.0.jar,/jars/geospark-1.2.0.jar,/jars/geospark-sql_2.3-1.2.0.jar,/jars/geospark-viz_2.3-1.2.0.jar"
      }, 
    "configurations":[]
  }
]

and also get your jars to upload as part of your bootstrap. You can do this from Maven but I just threw them on an S3 bucket:

#!/bin/bash
sudo mkdir /jars
sudo aws s3 cp s3://geospark-test-ds/bootstrap/geo_wrapper_2.11-0.3.0.jar /jars/
sudo aws s3 cp s3://geospark-test-ds/bootstrap/geospark-1.2.0.jar /jars/
sudo aws s3 cp s3://geospark-test-ds/bootstrap/geospark-sql_2.3-1.2.0.jar /jars/
sudo aws s3 cp s3://geospark-test-ds/bootstrap/geospark-viz_2.3-1.2.0.jar /jars/

If you are using an EMR Notebook You need a magic cell at the top of your notebook:

%%configure -f
{
"jars": [
        "s3://geospark-test-ds/bootstrap/geo_wrapper_2.11-0.3.0.jar",
        "s3://geospark-test-ds/bootstrap/geospark-1.2.0.jar",
        "s3://geospark-test-ds/bootstrap/geospark-sql_2.3-1.2.0.jar",
        "s3://geospark-test-ds/bootstrap/geospark-viz_2.3-1.2.0.jar"
    ]
}
于 2020-02-03T13:22:15.783 回答
1

I was seeing a similar kind of issue with SparkMeasure jars on Windows 10 machine

self.stagemetrics =
self.sc._jvm.ch.cern.sparkmeasure.StageMetrics(self.sparksession._jsparkSession)
TypeError: 'JavaPackage' object is not callable

So what I did was

  1. Went to 'SPARK_HOME' via Pyspark shell, and installed the required jar

    bin/pyspark --packages ch.cern.sparkmeasure:spark-measure_2.12:0.16

  2. Grabbed that jar ( ch.cern.sparkmeasure_spark-measure_2.12-0.16.jar ) and copied into the the Jars folder of 'SPARK_HOME'

  3. Reran the script and now it worked without that above error.

于 2020-08-26T00:08:20.413 回答