0

我正在尝试运行下面的示例代码:

import sparknlp
sparknlp.start()

from sparknlp.pretrained import PretrainedPipeline

explain_document_pipeline = PretrainedPipeline("explain_document_ml")
annotations = explain_document_pipeline.annotate("We are very happy about SparkNLP")
print(annotations)

我在 Anaconda env 中使用 Pycharm,最初我下载了 spark-nlp,pip spark-nlp==2.4.4但我看到网上有人说我应该使用:

pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.4

因为pip安装时我可能缺少一些依赖项,所以更好地使用pyspark --packages,但这给了我错误:

:: problems summary ::
:::: WARNINGS
                [NOT FOUND  ] com.typesafe#config;1.3.0!config.jar(bundle) (0ms)

        ==== local-m2-cache: tried

          file:/C:/Users/xxxxxxxx/.m2/repository/com/typesafe/config/1.3.0/config-1.3.0.jar

                [NOT FOUND  ] com.fasterxml.jackson.core#jackson-annotations;2.6.0!jackson-annotations.jar(bundle) (0ms)

        ==== local-m2-cache: tried

          file:/C:/Users/xxxxxxx/.m2/repository/com/fasterxml/jackson/core/jackson-annotations/2.6.0/jackson-annotations-2.6.0.jar

                ::::::::::::::::::::::::::::::::::::::::::::::

                ::              FAILED DOWNLOADS            ::

                :: ^ see resolution messages for details  ^ ::

                ::::::::::::::::::::::::::::::::::::::::::::::

                :: com.typesafe#config;1.3.0!config.jar(bundle)

                :: com.fasterxml.jackson.core#jackson-annotations;2.6.0!jackson-annotations.jar(bundle)

                ::::::::::::::::::::::::::::::::::::::::::::::


:::: ERRORS
        unknown resolver null

        unknown resolver null

        unknown resolver null

        unknown resolver null

        unknown resolver default

        unknown resolver null


:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [download failed: com.typesafe#config;1.3.0!config.jar(bundle), download failed: com.fasterxml.jackson.core#jackson-annotations;2.6.0!jackson-annotations.jar(
bundle)]

然后我下载了这两个丢失的 jar 并将它们复制到相应的文件夹中,然后运行命令,现在一切看起来都很好:

21/02/05 15:41:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.4
      /_/

Using Python version 3.7.1 (default, Oct 28 2018 08:39:03)
SparkSession available as 'spark'.
>>> 

然后我尝试重新运行顶部的示例 python 脚本,它给了我错误,这是日志:

Ivy Default Cache set to: C:\Users\xxxx\.ivy2\cache
The jars for the packages stored in: C:\Users\xxxx\.ivy2\jars
:: loading settings :: url = jar:file:/C:/Users/xxxx/AppData/Local/Continuum/anaconda3/envs/workEnv-python3.7/lib/site-packages/pyspark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.johnsnowlabs.nlp#spark-nlp_2.11 added as a dependency
............

我是新手,我已经搞砸了两天,请问有人可以帮助我吗???

4

0 回答 0