1

我在命令行中使用以下命令使用 Mahout kmeans 算法对数据进行聚类

mahout kmeans -i /vect_out/tfidf-vectors/ -c /out_canopy -o /out_kmeans -dm   
org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure -cd 1.0 -x 20 -cl

其中 /out_canopy 是包含使用 Mahout canopy 集群创建的集群的目录,其中包含一个clusters-0目录,该目录本身包含一个名为的目录_logs和一个名为part-r-00000

但是一直报以下错误

java.lang.IllegalStateException: No clusters found. Check your -c path.
at org.apache.mahout.clustering.kmeans.KMeansMapper.setup
4

2 回答 2

0

你确定那/out_canopy是目录吗?你试过:

file /out_canopy

似乎有一个错字,你只想写out_canopy或以某种方式相似......

于 2013-03-11T11:06:20.563 回答
0

这是一个特别令人头疼的问题。

1. Swallow IllegalStateExceptions thrown by removeShutdownHook in FileSystem. The javadoc states:

    public boolean removeShutdownHook(Thread hook)
    Throws:
    IllegalStateException - If the virtual machine is already in the process of shutting down 

So if we are getting this exception, it MEANS we are already in the process of shutdown, so we CANNOT, try what we may, removeShutdownHook. If Runtime had a method Runtime.isShutdownInProgress(), we could have checked for it before the removeShutdownHook call. As it stands, there is no such method. In my opinion, this would be a good patch regardless of the needs for this JIRA.

2. Not send SIGTERMs from the NM to the MR-AM in the first place. Rather we should expose a mechanism for the NM to politely tell the AM its no longer needed and should shutdown asap. Even after this, if an admin were to kill the MRAppMaster with a SIGTERM, the JobHistory would be lost defeating the purpose of 3614
于 2017-08-05T09:48:15.873 回答