0

How do i copy a file that is required for a hadoop program, to all compute nodes? I am aware that -file option for hadoop streaming does that. How do i do this for java+hadoop?

4

1 回答 1

1

Exactly the same way.

Assuming you use the ToolRunner / Configured / Tool pattern, the files you specify after the -files option will be in the local dir when your mapper / reducer / combiner tasks run:

public class Driver extends Configured implements Tool {
    public static void main(String args[]) {
        ToolRunner.run(new Driver(), args);
    }

    public int run(String args[]) {
        Job job = new Job(getConf());
        // ...
        job.waitForCompletion(true);
    }
}

public class MyMapper extends Mapper<K1, V1, K2, V2> {
    public void setup(Context context) {
        File myFile = new File("file.csv");
        // do something with file
    }


    // ...
}

You can then execute with:

#> hadoop jar myJar.jar Driver -files file.csv ......

See the Javadoc for GenericOptionsParser for more info

于 2012-04-20T01:57:22.307 回答