1

I'm doing some transformations on some data set and need to publish to a sane looking format. Current my final set looks like this when I run describe:

{memberId: long,companyIds: {(subsidiary: long)}}

I need it to look like this:

{memberId: long,companyIds: [long] }

where companyIds is the key to an array of ids of type long?

I'm really struggling with how to manipulate things in this way? Any ideas? I've tried using FLATTEN and other commands to know avail. I'm using AvroStorage to write the files into this schema:

The field schema I need to write this data to looks like this:

"fields": [
        { "name": "memberId", "type": "long"},
        { "name": "companyIds", "type": {"type": "array", "items": "int"}}
      ]
4

2 回答 2

2

There is no array type in PIG (http://pig.apache.org/docs/r0.10.0/basic.html#data-types). However, if all you need is a good looking output and if you don't have too many elements in companyIds, you may want to write a simple UDF that converts the bag into a nice formatted string.

Java code

public class BagToString extends EvalFunc<String>
{
    @Override
    public String exec(Tuple input) throws IOException
    {
        List<String> strings = new ArrayList<String>();
        DataBag bag = (DataBag) input.get(0);
        if (bag.size() == 0) {
            return null;
        }
        for (Iterator<Tuple> it = bag.iterator(); it.hasNext();) {
            Tuple t = it.next();
            strings.add(t.get(0).toString());
        }
        return StringUtils.join(strings, ":");
    }
}

PIG script

 foo = foreach bar generate memberId, BagToString(companyIds);
于 2013-06-21T00:14:29.137 回答
1

I know this is a bit old, but I recently ran into the same problem.

Based on the avrostorage documentation, using the latest version of pig and avrostorage, it is possible to directly cast bag to avro array.

In your case, you may want something like:

STORE blah INTO 'blah' USING AvroStorage('schema','{your schema}');

where the array field in the schema is

{  
    "name":"companyIds",
    "type":[  
        "null",
        {  
            "type":"array",
            "items":"long"
        }
    ],
    "doc":"company ids"
}
于 2015-11-12T23:05:49.410 回答