你好,我正在使用这个脚本来重新抓取我的 nutch,但它给出了一个例外..
Indexer: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/home/hat/crawl/indexes already exists
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:111)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:772)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
at org.apache.nutch.indexer.Indexer.index(Indexer.java:76)
at org.apache.nutch.indexer.Indexer.run(Indexer.java:97)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.indexer.Indexer.main(Indexer.java:106)
脚本
bin/nutch inject crawl/crawldb urls
bin/nutch generate crawl/crawldb crawl/segments
s1=`ls -d crawl/segments/2* | tail -1`
echo $s1
bin/nutch fetch $s1 -threads 100 -depth 3 -topN 5
bin/nutch updatedb crawl/crawldb $s1
bin/nutch invertlinks crawl/linkdb -dir crawl/segments
bin/nutch index crawl/indexes crawl/crawldb crawl/linkdb crawl/segments/*
得到这个http://wiki.apache.org/nutch/NutchTutorial
任何人都可以告诉我什么是错的......