java - Mahout：KMeans 聚类

Question

我是 Mahout 的新手，我有以下代码：

public class mahout {

public static final double[][] points = { {1, 1}, {2, 1}, {1, 2},{2, 2}, {3, 3}, {8, 8}, {9, 8}, {8, 9}, {9, 9}};

public static List<Vector> getPoints(double[][] raw) {
List<Vector> points = new ArrayList<Vector>();
for (int i = 0; i < raw.length; i++) {
 double[] fr = raw[i];
   Vector vec = new RandomAccessSparseVector(fr.length);
vec.assign(fr);
points.add(vec);
}

return points;

}

public static void main(String args[]) throws Exception {

int k = 2;

List<Vector> vectors = getPoints(points);

File testData = new File("testdata");
if (!testData.exists()) {
  testData.mkdir();
}
testData = new File("testdata/points");
if (!testData.exists()) {
  testData.mkdir();
}

Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
ClusterHelper.writePointsToFile(vectors, conf, new Path("testdata/points/file1"));

Path path = new Path("testdata/clusters/part-00000");
SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf,
    path, Text.class, Kluster.class);

for (int i = 0; i < k; i++) {
  Vector vec = vectors.get(i);
  Kluster cluster = new Kluster(vec, i, new EuclideanDistanceMeasure());
  writer.append(new Text(cluster.getIdentifier()), cluster);
}
writer.close();

Path output = new Path("output");
HadoopUtil.delete(conf, output);

KMeansDriver.run(conf, new Path("testdata/points"), new Path("testdata/clusters"),
  output, new EuclideanDistanceMeasure(), 0.001, 10,
  true, 0.0,false);

SequenceFile.Reader reader = new SequenceFile.Reader(fs,
    new Path("output/" + Kluster.CLUSTERED_POINTS_DIR
             + "/part-m-00000"), conf);

IntWritable key = new IntWritable();
WeightedVectorWritable value = new WeightedVectorWritable();
while (reader.next(key, value)) {
  System.out.println(value.toString() + " belongs to cluster "
                     + key.toString());
}
reader.close();
}
}

但是当我运行代码时出现这些错误：

 24-ott-2013 9.50.25 org.apache.hadoop.util.NativeCodeLoader <clinit>
AVVERTENZA: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
24-ott-2013 9.50.25 org.slf4j.impl.JCLLoggerAdapter info
INFO: Deleting output
24-ott-2013 9.50.25 org.slf4j.impl.JCLLoggerAdapter info
INFO: Input: testdata/points Clusters In: testdata/clusters Out: output Distance: org.apache.mahout.common.distance.EuclideanDistanceMeasure
24-ott-2013 9.50.25 org.slf4j.impl.JCLLoggerAdapter info
INFO: convergence: 0.0010 max Iterations: 10
24-ott-2013 9.50.25 org.apache.hadoop.security.UserGroupInformation doAs
GRAVE: PriviledgedActionException as:hp cause:java.io.IOException: Failed to set permissions of path: \tmp\hadoop-hp\mapred\staging\hp1776229724\.staging to 0700
Exception in thread "main" java.io.IOException: Failed to set permissions of path: \tmp\hadoop-hp\mapred\staging\hp1776229724\.staging to 0700
    at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689)
    at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:662)
    at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
    at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
    at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
    at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:918)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Unknown Source)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
    at org.apache.mahout.clustering.iterator.ClusterIterator.iterateMR(ClusterIterator.java:182)
    at org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:223)
    at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:143)
    at mahout.main(mahout.java:69)

问题出在哪里，我该如何解决？

score 0 · Accepted Answer

在 Windows 上运行 Hadoop 时会出现问题。

您可以看到针对此特定问题的一些 JIRA 问题：

https://issues.apache.org/jira/browse/HADOOP-7682

https://issues.apache.org/jira/browse/HADOOP-8089

唯一的解决方法是使用此补丁修补 Hadoop：

https://github.com/congainc/patch-hadoop_7682-1.0.x-win

或者升级到在 Windows 上本机运行的 Hadoop 2.2。

score -1 · Accepted Answer

看来问题是

Failed to set permissions of path: \tmp\hadoop-hp\mapred\staging\hp1776229724.staging to 0700

检查运行您的代码的用户是否对堆栈跟踪中提到的目录具有足够的权限。

还有痕迹

Unable to load native-hadoop library for your platform...

真的让我担心没有什么可以运行良好的事实^^

java - Mahout：KMeans 聚类

2 回答 2

Related

Reference