caffe-on-spark - 运行 CaffeOnSpark 示例时找不到文件“lenet_memory_train_test.prototxt”

Question

我正在尝试在此链接之后运行 caffeonspark 示例： https ://github.com/yahoo/CaffeOnSpark/wiki/GetStarted_local 我可以一直运行到 python 部分（此链接中的第 9 步）。但是，在以下链接中运行示例步骤时：https ://github.com/yahoo/CaffeOnSpark/wiki/GetStarted_python会导致以下给定错误...

>>> from pyspark import SparkConf,SparkContext
>>> from com.yahoo.ml.caffe.RegisterContext import registerContext,registerSQLContext
>>> from com.yahoo.ml.caffe.CaffeOnSpark import CaffeOnSpark
>>> from com.yahoo.ml.caffe.Config import Config
>>> from com.yahoo.ml.caffe.DataSource import DataSource
>>> from pyspark.mllib.linalg import Vectors
>>> from pyspark.mllib.regression import LabeledPoint
>>> from pyspark.mllib.classification import LogisticRegressionWithLBFGS
>>> registerContext(sc)
>>> registerSQLContext(sqlContext)
>>> cos=CaffeOnSpark(sc,sqlContext)
>>> cfg=Config(sc)
>>> cfg.protoFile='/home/abhishek/idw-workspace/CaffeOnSpark/data/lenet_memory_solver.prototxt'
>>> cfg.modelPath = 'file:/tmp/lenet.model'
>>> cfg.devices = 1
>>> cfg.isFeature=True
>>> cfg.label='label'
>>> cfg.features=['ip1']
>>> cfg.outputFormat = 'json'
>>> cfg.clusterSize = 1
>>> cfg.lmdb_partitions=cfg.clusterSize
>>> #Train
... dl_train_source = DataSource(sc).getSource(cfg,True)
16/09/12 15:44:36 INFO DataSource$: Source data layer:0
16/09/12 15:44:36 INFO LMDB: Batch size:64
>>> cos.train(dl_train_source)
16/09/12 15:44:37 INFO SparkContext: Starting job: collect at CaffeOnSpark.scala:136
16/09/12 15:44:37 INFO DAGScheduler: Got job 0 (collect at CaffeOnSpark.scala:136) with 1 output partitions
16/09/12 15:44:37 INFO DAGScheduler: Final stage: ResultStage 0 (collect at CaffeOnSpark.scala:136)
16/09/12 15:44:37 INFO DAGScheduler: Parents of final stage: List()
16/09/12 15:44:37 INFO DAGScheduler: Missing parents: List()
16/09/12 15:44:37 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[3] at map at CaffeOnSpark.scala:126), which has no missing parents
16/09/12 15:44:37 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 4.0 KB, free 4.0 KB)
16/09/12 15:44:37 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 2.4 KB, free 6.4 KB)
16/09/12 15:44:37 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.41.64:52748 (size: 2.4 KB, free: 511.1 MB)
16/09/12 15:44:37 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006
16/09/12 15:44:37 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[3] at map at CaffeOnSpark.scala:126)
16/09/12 15:44:37 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
16/09/12 15:44:37 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, impetus-1537u.impetus.co.in, partition 0,PROCESS_LOCAL, 2289 bytes)
16/09/12 15:44:38 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on impetus-1537u.impetus.co.in:41220 (size: 2.4 KB, free: 511.1 MB)
16/09/12 15:44:38 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, impetus-1537u.impetus.co.in): java.io.FileNotFoundException: lenet_memory_train_test.prototxt (No such file or directory)
  at java.io.FileInputStream.open0(Native Method)
  at java.io.FileInputStream.open(FileInputStream.java:195)
  at java.io.FileInputStream.<init>(FileInputStream.java:138)
  at java.io.FileInputStream.<init>(FileInputStream.java:93)
  at java.io.FileReader.<init>(FileReader.java:58)
  at com.yahoo.ml.jcaffe.Utils.GetNetParam(Utils.java:22)
  at com.yahoo.ml.caffe.Config.protoFile_$eq(Config.scala:65)
  at com.yahoo.ml.caffe.Config.solverParameter(Config.scala:323)
  at com.yahoo.ml.caffe.DataSource.init(DataSource.scala:51)
  at com.yahoo.ml.caffe.ImageDataSource.init(ImageDataSource.scala:39)
  at com.yahoo.ml.caffe.CaffeProcessor$$anonfun$5.apply(CaffeProcessor.scala:42)
  at com.yahoo.ml.caffe.CaffeProcessor$$anonfun$5.apply(CaffeProcessor.scala:41)
  at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
  at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
  at com.yahoo.ml.caffe.CaffeProcessor.<init>(CaffeProcessor.scala:41)
  at com.yahoo.ml.caffe.CaffeProcessor$.instance(CaffeProcessor.scala:22)
  at com.yahoo.ml.caffe.CaffeOnSpark$$anonfun$5.apply(CaffeOnSpark.scala:128)
  at com.yahoo.ml.caffe.CaffeOnSpark$$anonfun$5.apply(CaffeOnSpark.scala:126)
  at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
  at scala.collection.Iterator$class.foreach(Iterator.scala:727)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
  at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
  at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
  at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
  at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
  at scala.collection.AbstractIterator.to(Iterator.scala:1157)
  at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
  at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
  at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
  at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
  at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
  at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
  at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
  at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
  at org.apache.spark.scheduler.Task.run(Task.scala:89)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)

当前目录（{SPARK_HOME}/data）的内容如下：

$ ll
total 96
drwxrwxr-x 10 abhishek abhishek  4096 Sep 12 15:50 ./
drwxrwxr-x 12 abhishek abhishek  4096 Sep  8 22:26 ../
drwxrwxr-x  5 abhishek abhishek  4096 Sep 12 13:54 caffe/
-rwxrwxr-x  1 abhishek abhishek  5567 Aug 22 18:51 caffenet_train_net.prototxt*
-rwxrwxr-x  1 abhishek abhishek   847 Aug 22 18:51 cifar10_quick_solver.prototxt*
-rwxrwxr-x  1 abhishek abhishek  3340 Aug 22 18:51 cifar10_quick_train_test.prototxt*
drwxr--r--  2 abhishek abhishek  4096 Sep  8 21:36 cifar10_test_lmdb/
drwxr--r--  2 abhishek abhishek  4096 Sep  8 21:36 cifar10_train_lmdb/
drwxrwxr-x  3 abhishek abhishek  4096 Sep 12 13:55 com/
drwxrwxr-x  2 abhishek abhishek  4096 Sep 12 13:54 examples/
drwxrwxr-x  2 abhishek abhishek  4096 Aug 22 18:51 images/
-rwxrwxr-x  1 abhishek abhishek   648 Aug 22 18:51 lenet_cos_solver.prototxt*
-rwxrwxr-x  1 abhishek abhishek  2894 Aug 22 18:51 lenet_cos_train_test.prototxt*
-rw-rw-r--  1 abhishek abhishek   692 Aug 22 18:51 lenet_dataframe_solver.prototxt
-rw-rw-r--  1 abhishek abhishek  2544 Aug 22 18:51 lenet_dataframe_train_test.prototxt
-rwxrwxr-x  1 abhishek abhishek   651 Aug 22 18:51 lenet_memory_solver.prototxt*
-rwxrwxr-x  1 abhishek abhishek  2581 Aug 24 18:00 lenet_memory_train_test.prototxt*
-rw-rw-r--  1 abhishek abhishek 12299 Sep  8 21:36 mean.binaryproto
drwxr--r--  2 abhishek abhishek  4096 Sep  8 22:09 mnist_test_lmdb/
drwxr--r--  2 abhishek abhishek  4096 Sep  8 22:09 mnist_train_lmdb/

如果有人尝试过并且知道它，请告诉我为什么它给我错误。

score 0 · Accepted Answer

我还没有评论的声誉，所以就这样吧。

从日志中，我唯一怀疑的是您是否正确执行了此步骤：https ://github.com/yahoo/CaffeOnSpark/wiki/GetStarted_python#submit-python-script

启动 python 脚本时，请确保lenet_memory_train_test.prototxt正确指定了路径和其他所需文件。

pushd ${CAFFE_ON_SPARK}/data/
unzip ${CAFFE_ON_SPARK}/caffe-grid/target/caffeonsparkpythonapi.zip
spark-submit  --master ${MASTER_URL} \
          --driver-library-path "${CAFFE_ON_SPARK}/caffe-grid/target/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar" \
          --driver-class-path "${CAFFE_ON_SPARK}/caffe-grid/target/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar" \
          --conf spark.cores.max=${TOTAL_CORES} \
          --conf spark.task.cpus=${CORES_PER_WORKER} \
          --conf spark.driver.extraLibraryPath="${LD_LIBRARY_PATH}" \
          --conf spark.executorEnv.LD_LIBRARY_PATH="${LD_LIBRARY_PATH}" \
          --py-files ${CAFFE_ON_SPARK}/caffe-grid/target/caffeonsparkpythonapi.zip \
          --files ${CAFFE_ON_SPARK}/data/caffe/_caffe.so,${CAFFE_ON_SPARK}/data/lenet_memory_solver.prototxt,${CAFFE_ON_SPARK}/data/lenet_memory_train_test.prototxt \
          --jars "${CAFFE_ON_SPARK}/caffe-grid/target/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar" \
          --conf spark.pythonargs="-conf lenet_memory_solver.prototxt -model file:///tmp/lenet.model -features accuracy,ip1,ip2 -label label -output file:///tmp/output -devices 1 -outputFormat json -clusterSize ${SPARK_WORKER_INSTANCES}" \
          examples/MultiClassLogisticRegression.py

寻找--files标签。

caffe-on-spark - 运行 CaffeOnSpark 示例时找不到文件“lenet_memory_train_test.prototxt”

1 回答 1

Related

Reference