I'm using Hadoop to compute co-occurrence similarity between words. I have a file that consists of co-occurring word pairs that looks like:
a b
a c
b c
b d
I'm using a Graph based approach that treats words as nodes and co-occurring words have an edge between them. My algorithm needs to compute the degree of all nodes. I've successfully written a Map-Reduce
job to compute the total degree which outputs the following:
a 2
b 3
c 2
d 1
Currently, the output is written back to a file but what I want instead is to capture the result into, say, a java.util.HashMap
. I, then, want to use this HashMap
in an other Reduce
job to compute the final similarity.
Here are my questions:
- Is it possible to capture results of reduce job in memory (
List
,Map
). If so, how ? - Is this the best approach ? If not, How should I deal with this ?