我正在尝试运行下面的示例代码:
import sparknlp
sparknlp.start()
from sparknlp.pretrained import PretrainedPipeline
explain_document_pipeline = PretrainedPipeline("explain_document_ml")
annotations = explain_document_pipeline.annotate("We are very happy about SparkNLP")
print(annotations)
我在 Anaconda env 中使用 Pycharm,最初我下载了 spark-nlp,pip spark-nlp==2.4.4
但我看到网上有人说我应该使用:
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.4
因为pip
安装时我可能缺少一些依赖项,所以更好地使用pyspark --packages
,但这给了我错误:
:: problems summary ::
:::: WARNINGS
[NOT FOUND ] com.typesafe#config;1.3.0!config.jar(bundle) (0ms)
==== local-m2-cache: tried
file:/C:/Users/xxxxxxxx/.m2/repository/com/typesafe/config/1.3.0/config-1.3.0.jar
[NOT FOUND ] com.fasterxml.jackson.core#jackson-annotations;2.6.0!jackson-annotations.jar(bundle) (0ms)
==== local-m2-cache: tried
file:/C:/Users/xxxxxxx/.m2/repository/com/fasterxml/jackson/core/jackson-annotations/2.6.0/jackson-annotations-2.6.0.jar
::::::::::::::::::::::::::::::::::::::::::::::
:: FAILED DOWNLOADS ::
:: ^ see resolution messages for details ^ ::
::::::::::::::::::::::::::::::::::::::::::::::
:: com.typesafe#config;1.3.0!config.jar(bundle)
:: com.fasterxml.jackson.core#jackson-annotations;2.6.0!jackson-annotations.jar(bundle)
::::::::::::::::::::::::::::::::::::::::::::::
:::: ERRORS
unknown resolver null
unknown resolver null
unknown resolver null
unknown resolver null
unknown resolver default
unknown resolver null
:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [download failed: com.typesafe#config;1.3.0!config.jar(bundle), download failed: com.fasterxml.jackson.core#jackson-annotations;2.6.0!jackson-annotations.jar(
bundle)]
然后我下载了这两个丢失的 jar 并将它们复制到相应的文件夹中,然后运行命令,现在一切看起来都很好:
21/02/05 15:41:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.4.4
/_/
Using Python version 3.7.1 (default, Oct 28 2018 08:39:03)
SparkSession available as 'spark'.
>>>
然后我尝试重新运行顶部的示例 python 脚本,它给了我错误,这是日志:
Ivy Default Cache set to: C:\Users\xxxx\.ivy2\cache
The jars for the packages stored in: C:\Users\xxxx\.ivy2\jars
:: loading settings :: url = jar:file:/C:/Users/xxxx/AppData/Local/Continuum/anaconda3/envs/workEnv-python3.7/lib/site-packages/pyspark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.johnsnowlabs.nlp#spark-nlp_2.11 added as a dependency
............
我是新手,我已经搞砸了两天,请问有人可以帮助我吗???