I'd like to count the number of keys in a map in Pig. I could write a UDF to do this, but I was hoping there would be an easier way.
data = LOAD 'hbase://MARS1'
USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
'A:*', '-loadKey true -caching=100000')
AS (id:bytearray, A_map:map[]);
In the code above, I want to basically build a histogram of id
and how many items in column family A
that key has.
In hoping, I tried c = FOREACH data GENERATE id, COUNT(A_map);
but that unsurprisingly didn't work.
Or, perhaps someone can suggest a better way to do this entirely. If I can't figure this out soon I'll just write a Java MapReduce job or a Pig UDF.