When using the DistributedCache in Hadoop, I manage to push the files from hdfs in the driver class like this:
FileSystem fileSystem = FileSystem.get(getConf());
DistributedCache.createSymlink(conf);
DistributedCache.addCacheFile(fileSystem.getUri().resolve("/dumps" + "#" + "file.txt"), job.getConfiguration());
Then, to read the file, in the setup() of Mapper I do:
Path localPaths[] = context.getLocalCacheFiles();
The file is located in the cache, under a path /tmp/solr-map-reduce/yarn-local-dirs/usercache/user/appcache/application_1398146231614_0045/container_1398146231614_0045_01_000004/file.txt. But when I read it, I get IOException: file is a directory.
How can one go about solving this?