1

I'd like to count the number of keys in a map in Pig. I could write a UDF to do this, but I was hoping there would be an easier way.

data = LOAD 'hbase://MARS1'
       USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
         'A:*', '-loadKey true -caching=100000')
       AS (id:bytearray, A_map:map[]);

In the code above, I want to basically build a histogram of id and how many items in column family A that key has.

In hoping, I tried c = FOREACH data GENERATE id, COUNT(A_map); but that unsurprisingly didn't work.

Or, perhaps someone can suggest a better way to do this entirely. If I can't figure this out soon I'll just write a Java MapReduce job or a Pig UDF.

4

1 回答 1

2

SIZE should apparently work for you (not tried it myself):

于 2012-12-05T23:01:37.613 回答