2

在启动这个随时可用的 PredictionIO Amazon EC2 实例之后,我正在关注这个快速入门,并且在运行这些命令之后它在以下位置失败:pio train

pio app new MyTextApp
pio import --appid 1 --input data/stopwords.json
pio import --appid 1 --input data/emails.json
pio build
pio train

...

Data set is empty, make sure event fields match imported data.

Exception in thread "main" java.lang.IllegalStateException: Haven't seen any document yet.
    at org.apache.spark.mllib.feature.IDF$DocumentFrequencyAggregator.idf(IDF.scala:132)
    at org.apache.spark.mllib.feature.IDF.fit(IDF.scala:56)
    at uk.co.news.PreparedData.<init>(Preparator.scala:70)
    at uk.co.news.Preparator.prepare(Preparator.scala:47)
    at uk.co.news.Preparator.prepare(Preparator.scala:43)

由于运行导入电子邮件的命令没有错误,我不明白为什么数据集仍然是空的。我仔细检查了email.json文件,数据确实在那里,这是运行时的结果

pio import --appid 1 --input data/emails.json

ubuntu@ip-172-31-0-60:~/pio-textclassification$ pio import --appid 1 --input data/emails.json
[INFO] [Runner$] Submission command: /opt/spark-1.4.1-bin-hadoop2.6/bin/spark-submit --class io.prediction.tools.imprt.FileToEvents --files file:/opt/PredictionIO/conf/log4j.properties --driver-class-path /opt/PredictionIO/conf file:/opt/PredictionIO/lib/pio-assembly-0.9.4.jar --appid 1 --input file:/home/ubuntu/pio-textclassification/data/emails.json --env PIO_ENV_LOADED=1,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_FS_BASEDIR=/home/ubuntu/.pio_store,PIO_HOME=/opt/PredictionIO,PIO_FS_ENGINESDIR=/home/ubuntu/.pio_store/engines,PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://localhost/pio,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=PGSQL,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=PGSQL,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_STORAGE_SOURCES_PGSQL_PASSWORD=pio,PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc,PIO_FS_TMPDIR=/home/ubuntu/.pio_store/tmp,PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=PGSQL,PIO_CONF_DIR=/opt/PredictionIO/conf
[INFO] [Remoting] Starting remoting
[INFO] [Remoting] Remoting started; listening on addresses :[akka.tcp://sparkDriver@172.31.0.60:49257]
[INFO] [FileToEvents$] Events are imported.
[INFO] [FileToEvents$] Done.

编辑:

pio build --verbose

显示一个被吞下的异常。问题出在数据库连接上,但由于部分异常被替换为“...”,因此仍不清楚出了什么问题

[DEBUG] [ConnectionPool$] Registered connection pool : ConnectionPool(url:jdbc:postgresql://localhost/pio, user:pio) using factory : <default>
[DEBUG] [ConnectionPool$] Registered singleton connection pool : ConnectionPool(url:jdbc:postgresql://localhost/pio, user:pio)
[DEBUG] [StatementExecutor$$anon$1] SQL execution completed

  [SQL Execution]
   create table if not exists pio_meta_enginemanifests ( id varchar(100) not null primary key, version text not null, engineName text not null, description text, files text not null, engineFactory text not null); (10 ms)

  [Stack Trace]
    ...
    io.prediction.data.storage.jdbc.JDBCEngineManifests$$anonfun$1.apply(JDBCEngineManifests.scala:37)
    io.prediction.data.storage.jdbc.JDBCEngineManifests$$anonfun$1.apply(JDBCEngineManifests.scala:29)
    scalikejdbc.DBConnection$class.autoCommit(DBConnection.scala:222)
    scalikejdbc.DB.autoCommit(DB.scala:60)
    scalikejdbc.DB$$anonfun$autoCommit$1.apply(DB.scala:215)
    scalikejdbc.DB$$anonfun$autoCommit$1.apply(DB.scala:214)
    scalikejdbc.LoanPattern$class.using(LoanPattern.scala:18)
    scalikejdbc.DB$.using(DB.scala:138)
    scalikejdbc.DB$.autoCommit(DB.scala:214)
    io.prediction.data.storage.jdbc.JDBCEngineManifests.<init>(JDBCEngineManifests.scala:29)
    sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    io.prediction.data.storage.Storage$.getDataObject(Storage.scala:293)
    ...

[INFO] [RegisterEngine$] Registering engine JmhjlGoEjJuKXhXpY70MbEkuGHMuOZzL 8ccd38126d56ed48adaa9f85547131467f7629f7
[DEBUG] [StatementExecutor$$anon$1] SQL execution completed

  [SQL Execution]
   update pio_meta_enginemanifests set engineName = 'pio-textclassification', description = 'pio-autogen-manifest', files = 'file:/home/ubuntu/pio-textclassification/target/scala-2.10/uk.co.news-assembly-0.1-SNAPSHOT-deps.jar... (192)', engineFactory = '' where id = 'JmhjlGoEjJuKXhXpY70MbEkuGHMuOZzL' and version = '8ccd38126d56ed48adaa9f85547131467f7629f7'; (3 ms)

  [Stack Trace]
    ...
    io.prediction.data.storage.jdbc.JDBCEngineManifests$$anonfun$7.apply(JDBCEngineManifests.scala:85)
    io.prediction.data.storage.jdbc.JDBCEngineManifests$$anonfun$7.apply(JDBCEngineManifests.scala:78)
    scalikejdbc.DBConnection$$anonfun$3.apply(DBConnection.scala:297)
    scalikejdbc.DBConnection$class.scalikejdbc$DBConnection$$rollbackIfThrowable(DBConnection.scala:274)
    scalikejdbc.DBConnection$class.localTx(DBConnection.scala:295)
    scalikejdbc.DB.localTx(DB.scala:60)
    scalikejdbc.DB$.localTx(DB.scala:257)
    io.prediction.data.storage.jdbc.JDBCEngineManifests.update(JDBCEngineManifests.scala:78)
    io.prediction.tools.RegisterEngine$.registerEngine(RegisterEngine.scala:50)
    io.prediction.tools.console.Console$.build(Console.scala:813)
    io.prediction.tools.console.Console$$anonfun$main$1.apply(Console.scala:698)
    io.prediction.tools.console.Console$$anonfun$main$1.apply(Console.scala:684)
    scala.Option.map(Option.scala:145)
    io.prediction.tools.console.Console$.main(Console.scala:684)
    io.prediction.tools.console.Console.main(Console.scala)
    ...

[DEBUG] [StatementExecutor$$anon$1] SQL execution completed

  [SQL Execution]
   INSERT INTO pio_meta_enginemanifests VALUES( 'JmhjlGoEjJuKXhXpY70MbEkuGHMuOZzL', '8ccd38126d56ed48adaa9f85547131467f7629f7', 'pio-textclassification', 'pio-autogen-manifest', 'file:/home/ubuntu/pio-textclassification/target/scala-2.10/uk.co.news-assembly-0.1-SNAPSHOT-deps.jar... (192)', ''); (1 ms)

  [Stack Trace]
    ...
    io.prediction.data.storage.jdbc.JDBCEngineManifests$$anonfun$2.apply(JDBCEngineManifests.scala:48)
    io.prediction.data.storage.jdbc.JDBCEngineManifests$$anonfun$2.apply(JDBCEngineManifests.scala:40)
    scalikejdbc.DBConnection$$anonfun$3.apply(DBConnection.scala:297)
    scalikejdbc.DBConnection$class.scalikejdbc$DBConnection$$rollbackIfThrowable(DBConnection.scala:274)
    scalikejdbc.DBConnection$class.localTx(DBConnection.scala:295)
    scalikejdbc.DB.localTx(DB.scala:60)
    scalikejdbc.DB$.localTx(DB.scala:257)
    io.prediction.data.storage.jdbc.JDBCEngineManifests.insert(JDBCEngineManifests.scala:40)
    io.prediction.data.storage.jdbc.JDBCEngineManifests.update(JDBCEngineManifests.scala:89)
    io.prediction.tools.RegisterEngine$.registerEngine(RegisterEngine.scala:50)
    io.prediction.tools.console.Console$.build(Console.scala:813)
    io.prediction.tools.console.Console$$anonfun$main$1.apply(Console.scala:698)
    io.prediction.tools.console.Console$$anonfun$main$1.apply(Console.scala:684)
    scala.Option.map(Option.scala:145)
    io.prediction.tools.console.Console$.main(Console.scala:684)
    ...

[INFO] [Console$] Your engine is ready for training.
4

2 回答 2

1

需要检查的几件事:

  1. “pio 应用程序列表”是否显示 MyTextApp 具有 appId 1?
  2. 下载https://github.com/yipjustin/pio-event-distribution-checker并更改 engine.json 使 appId 为 1,然后“pio build”和“pio train”查看数据是否实际导入。

PS 有一个 google 组 ( https://groups.google.com/forum/#!forum/predictionio-user),PredictionIO用户社区将更快地回答您的问题。

于 2016-01-14T23:45:04.190 回答
0

解决方案是在运行之前更改DataSource.scala以匹配文件中的架构。emails.jsonpio build

这是我必须在文件中更改的唯一方法:

 private def readEventData(sc: SparkContext) : RDD[Observation] = {
    //Get RDD of Events.
    PEventStore.find(
      appName = dsp.appName,
      entityType = Some("content"), 
      eventNames = Some(List("e-mail")) 

      // Convert collected RDD of events to and RDD of Observation
      // objects.
    )(sc).map(e => {
      val label : String = e.properties.get[String]("label")
      Observation(
        if (label == "spam") 1.0 else 0.0,
        e.properties.get[String]("text"),
        label
      )
    }).cache
  }

我不得不将以前的值更改为“内容”、“电子邮件”和“垃圾邮件”。

于 2016-01-19T16:25:30.413 回答