Is it possible to get URLs into Nutch directly from a database or a service etc. I'm not interested in the ways which data is taken from the database or service and written to seed.txt.
问问题
612 次
1 回答
1
No. This cannot be done directly with the default nutch codebase. You need to modify Injector.java to achieve that.
EDIT:
Try using DBInputFormat : an InputFormat that reads input data from an SQL table. You need to modify the Inject code here (line 3 in snippet below):
JobConf sortJob = new NutchJob(getConf());
sortJob.setJobName("inject " + urlDir);
FileInputFormat.addInputPath(sortJob, urlDir);
sortJob.setMapperClass(InjectMapper.class);
于 2012-04-16T01:44:01.917 回答