1

I followed each and every step in the Apache Nutch Wiki. I am using MacOSX 10.8.3, my JAVA_HOME is perfectly set and can even see various command options when bin/nutch is executed (according to the wiki).

But when I use bin/nutch crawl urls -dir crawl -depth 3 -topN 5, I get the following error:

bin/nutch: line 104: [: too many arguments
Error: Could not find or load main class Engines

FYI: I have already created a urls directory in apache-nutch-1.6/urls

Can any one tell what might be the problem?

4

2 回答 2

0

经过一番研究,我发现我忘了设置 NUTCH_JAVA_HOME。这是步骤:

set NUTCH_JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home
export NUTCH_JAVA_HOME

是的,我也重置了 JAVA_HOME:

set JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home
export JAVA_HOME
于 2013-05-14T16:25:53.273 回答
0

您可以尝试如下:

首先,通过 ant 构建 nutch。

cd nutch-1.x.x/runtime/local/

mkdir urls(用于种子列表目录)

mkdir crawl(供-dir选择)

vim urls/seed,然后添加一个或多个 URL(例如:http ://www.examplesite.com )

bin/nutch crawl urls- 或者 - bin/nutch crawl urls -dir crawl -depth 3 -topN 5

于 2013-05-14T12:29:19.040 回答