0

When using the DistributedCache in Hadoop, I manage to push the files from hdfs in the driver class like this:

FileSystem fileSystem = FileSystem.get(getConf());
DistributedCache.createSymlink(conf);
DistributedCache.addCacheFile(fileSystem.getUri().resolve("/dumps" + "#" + "file.txt"), job.getConfiguration());

Then, to read the file, in the setup() of Mapper I do:

Path localPaths[] = context.getLocalCacheFiles();

The file is located in the cache, under a path /tmp/solr-map-reduce/yarn-local-dirs/usercache/user/appcache/application_1398146231614_0045/container_1398146231614_0045_01_000004/file.txt. But when I read it, I get IOException: file is a directory.

How can one go about solving this?

4

0 回答 0