How do i copy a file that is required for a hadoop program, to all compute nodes? I am aware that -file option for hadoop streaming does that. How do i do this for java+hadoop?
问问题
811 次
1 回答
1
Exactly the same way.
Assuming you use the ToolRunner / Configured / Tool pattern, the files you specify after the -files option will be in the local dir when your mapper / reducer / combiner tasks run:
public class Driver extends Configured implements Tool {
public static void main(String args[]) {
ToolRunner.run(new Driver(), args);
}
public int run(String args[]) {
Job job = new Job(getConf());
// ...
job.waitForCompletion(true);
}
}
public class MyMapper extends Mapper<K1, V1, K2, V2> {
public void setup(Context context) {
File myFile = new File("file.csv");
// do something with file
}
// ...
}
You can then execute with:
#> hadoop jar myJar.jar Driver -files file.csv ......
See the Javadoc for GenericOptionsParser for more info
于 2012-04-20T01:57:22.307 回答