我正在试用最新版本的 prediction.io(0.9.1 版)。我按照本页中的教程安装了预测 io 及其依赖项:http: //docs.prediction.io/install/install-linux/
我已将predictionio/bin
目录的路径添加到我的.bashrc
文件中,以便我可以从终端使用命令行工具:
export PATH=$PATH:/home/wern/PredictionIO-0.9.1/bin
export JAVA_HOME="/usr/lib/jvm/java-8-oracle"
执行时我得到以下信息pio-start-all
:
Starting Elasticsearch...
Starting HBase...
starting master, logging to /home/wern/hbase-0.98.11-hadoop2/bin/../logs/hbase-me-master-mycomputer.out
Waiting 10 seconds for HBase to fully initialize...
Starting PredictionIO Event Server...
执行java -version
返回以下内容:
java version "1.8.0_40"
Java(TM) SE Runtime Environment (build 1.8.0_40-b25)
Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)
执行pio status
返回以下内容:
PredictionIO
Installed at: /home/me/PredictionIO-0.9.1
Version: 0.9.1
Apache Spark
Installed at: /home/wern/spark-1.2.1-bin-hadoop2.4
Version: 1.2.1 (meets minimum requirement of 1.2.0)
Storage Backend Connections
Verifying Meta Data Backend
Verifying Model Data Backend
Verifying Event Data Backend
[WARN] [NativeCodeLoader] Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Test write Event Store (App Id 0)
[INFO] [HBLEvents] The table predictionio_eventdata:events_0 doesn't exist yet. Creating now...
[INFO] [HBLEvents] Removing table predictionio_eventdata:events_0...
(sleeping 5 seconds for all messages to show up...)
Your system is all ready to go.
接下来我得到一个通用模板。我从主目录执行了这个命令,所以RecommendationApp
完成后我得到了一个目录:
pio template get PredictionIO/template-scala-parallel-recommendation RecommendationApp
接下来我创建了一个新的预测 io 应用程序:
pio app new MyGenericRecommendationApp
这将返回以下内容:
[WARN] [NativeCodeLoader] Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[INFO] [HBLEvents] The table predictionio_eventdata:events_3 doesn't exist yet. Creating now...
[INFO] [App$] Initialized Event Store for this app ID: 3.
[INFO] [App$] Created new app:
[INFO] [App$] Name: MyGenericRecommendationApp
[INFO] [App$] ID: 3
[INFO] [App$] Access Key: C7vfcipXd0baQcZYzqr73EwSPT2Bd0YW1OTLgEdlUA9FOeBja6dyBVIKaYnQbsUO
接下来,我导航到RecommendationApp
引擎目录并下载示例数据:
curl https://raw.githubusercontent.com/apache/spark/master/data/mllib/sample_movielens_data.txt --create-dirs -o data/sample_movielens_data.txt
然后我使用 python 导入它:
python data/import_eventserver.py --access_key C7vfcipXd0baQcZYzqr73EwSPT2Bd0YW1OTLgEdlUA9FOeBja6dyBVIKaYnQbsUO
这样就成功导入了数据。
接下来我更新了engine.json
文件以匹配我之前创建的应用程序的 ID。
"datasource": {
"params" : {
"appId": 3
}
},
然后我执行了pio build
. 这花了一段时间,但它最终返回以下内容:
[INFO] [Console$] Your engine is ready for training.
最后这是我的问题所在。执行pio train
结果如下:
[INFO] [Console$] Using existing engine manifest JSON at /home/wern/RecommendationApp/manifest.json
[WARN] [NativeCodeLoader] Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[INFO] [RunWorkflow$] Submission command: /home/wern/spark-1.2.1-bin-hadoop2.4/bin/spark-submit --class io.prediction.workflow.CreateWorkflow --name PredictionIO Training: RTn3BZbRfxOlOkDQCHBmOaMBHTP1gmOg 92c46ac3197f8bf4696281a1f76eaaa943495d3f () --jars file:/home/wern/.pio_store/engines/RTn3BZbRfxOlOkDQCHBmOaMBHTP1gmOg/92c46ac3197f8bf4696281a1f76eaaa943495d3f/template-scala-parallel-recommendation-assembly-0.1-SNAPSHOT-deps.jar,file:/home/wern/.pio_store/engines/RTn3BZbRfxOlOkDQCHBmOaMBHTP1gmOg/92c46ac3197f8bf4696281a1f76eaaa943495d3f/template-scala-parallel-recommendation_2.10-0.1-SNAPSHOT.jar --files /home/wern/PredictionIO-0.9.1/conf/log4j.properties,/home/wern/PredictionIO-0.9.1/conf/hbase-site.xml --driver-class-path /home/wern/PredictionIO-0.9.1/conf:/home/wern/PredictionIO-0.9.1/conf /home/wern/PredictionIO-0.9.1/lib/pio-assembly-0.9.1.jar --env PIO_STORAGE_SOURCES_HBASE_TYPE=hbase,PIO_ENV_LOADED=1,PIO_STORAGE_SOURCES_HBASE_HOSTS=0,PIO_STORAGE_REPOSITORIES_METADATA_NAME=predictionio_metadata,PIO_FS_BASEDIR=/home/wern/.pio_store,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost,PIO_STORAGE_SOURCES_HBASE_HOME=/home/wern/hbase-0.98.11-hadoop2,PIO_HOME=/home/wern/PredictionIO-0.9.1,PIO_FS_ENGINESDIR=/home/wern/.pio_store/engines,PIO_STORAGE_SOURCES_HBASE_PORTS=0,PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=predictionio_eventdata,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=/home/wern/elasticsearch-1.4.4,PIO_FS_TMPDIR=/home/wern/.pio_store/tmp,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_,PIO_STORAGE_SOURCES_LOCALFS_HOSTS=/home/wern/.pio_store/models,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE,PIO_CONF_DIR=/home/wern/PredictionIO-0.9.1/conf,PIO_STORAGE_SOURCES_LOCALFS_PORTS=0,PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300,PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs --engine-id RTn3BZbRfxOlOkDQCHBmOaMBHTP1gmOg --engine-version 92c46ac3197f8bf4696281a1f76eaaa943495d3f --engine-variant /home/wern/RecommendationApp/engine.json --verbosity 0
Spark assembly has been built with Hive, including Datanucleus jars on classpath
[WARN] [NativeCodeLoader] Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[INFO] [Engine] Extracting datasource params...
[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.
[INFO] [Engine] Datasource params: (,DataSourceParams(3))
[INFO] [Engine] Extracting preparator params...
[INFO] [Engine] Preparator params: (,Empty)
[INFO] [Engine] Extracting serving params...
[INFO] [Engine] Serving params: (,Empty)
[WARN] [Utils] Your hostname, fraukojiro resolves to a loopback address: 127.0.1.1; using 192.168.254.105 instead (on interface wlan0)
[WARN] [Utils] Set SPARK_LOCAL_IP if you need to bind to another address
[INFO] [Remoting] Starting remoting
[INFO] [Remoting] Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.254.105:37397]
[INFO] [Engine$] EngineWorkflow.train
[INFO] [Engine$] DataSource: com.wern.DataSource@653fb8d1
[INFO] [Engine$] Preparator: com.wern.Preparator@93501be
[INFO] [Engine$] AlgorithmList: List(com.wern.ALSAlgorithm@3c25cfe1)
[INFO] [Engine$] Data santiy check is on.
[ERROR] [HBPEvents] The appId 3 does not exist. Please use valid appId.
Exception in thread "main" java.lang.Exception: HBase table not found for appId 3.
at io.prediction.data.storage.hbase.HBPEvents.checkTableExists(HBPEvents.scala:54)
at io.prediction.data.storage.hbase.HBPEvents.find(HBPEvents.scala:70)
at com.wern.DataSource.readTraining(DataSource.scala:32)
at com.wern.DataSource.readTraining(DataSource.scala:18)
at io.prediction.controller.PDataSource.readTrainingBase(DataSource.scala:41)
at io.prediction.controller.Engine$.train(Engine.scala:518)
at io.prediction.controller.Engine.train(Engine.scala:147)
at io.prediction.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:61)
at io.prediction.workflow.CreateWorkflow$.main(CreateWorkflow.scala:258)
at io.prediction.workflow.CreateWorkflow.main(CreateWorkflow.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
基本上它无法识别我提供的 appId。但是执行pio app list
显示ID确实是3。
[INFO] [App$] Name | ID | Access Key | Allowed Event(s)
[INFO] [App$] TestRecommendation | 2 | GJBuFYODWTwFBVQ2D2nbBFW5C0iKClNLEMbYGGhDGoZGEtLre62BLwLJlioTEeJP | (all)
[INFO] [App$] MyGenericRecommendationApp | 3 | C7vfcipXd0baQcZYzqr73EwSPT2Bd0YW1OTLgEdlUA9FOeBja6dyBVIKaYnQbsUO | (all)
[INFO] [App$] Finished listing 2 app(s).
有任何想法吗?