0

In a Pig script, I am manipulating tuples of the following form:

(a1:int,a2:chararray,a3:int)

An example of a2 could be: "123,232,444,223,100" (Five numbers between 100 and 500 separated by commas).

I would like to get the following tuple:

(a1:int,u1:int,u2:int,u3:int,u4:int,u5:int,a3:int)

Where u1 to u5 correspond to the values of the a2 chararray.

Is it possible to do so using only pig functions?

I have tried to write an UDF in Python as follows:

@outputSchema("int:u1,int:u2,int:u3,int:u4,int:u5")
def mosListToTuple(list):
    u1 = list[0:3]
    u2 = list[5:8]
    u3 = list[10:13]
    u4 = list[15:18]
    u5 = list[20:23]
    return u1,u2,u3,u4,u5

But I get the error:

ERROR 1200: <line 1, column 4>  Syntax error, unexpected symbol at or near 'u1'

Any idea?

Thank you.

4

1 回答 1

3

You don't need to write your own UDF for this:

B =
    FOREACH A
    GENERATE
        a1, 
        FLATTEN(STRSPLIT(a2, ',')) AS (u1:int,u2:int,u3:int,u4:int,u5:int),
        a3;
于 2013-08-12T13:53:21.890 回答