未经测试,但这是我将采用的一般方法:获取一个包含 ID 和一袋值的变量,将其展平,以便您获得只有 id 和单个值的行,获取不同的行,然后按 ID 分组。这将为您提供每个 ID 的一袋值,如果您想输出,您可以将其转换为字符串。
A = LOAD 'input' USING TextLoader() as line:chararray;
B = FOREACH A GENERATE STRSPLIT(line,',',2) as (id:chararray,values:chararray)
C = FOREACH B GENERATE id, FLATTEN(TOBAG(STRSPLIT(values,','))) as value:chararray;
D = DISTINCT C; -- I'm assuming you actually want distinct values, wasn't clear.
E = GROUP D by id;
F = FOREACH E GENERATE group as id, BagToString(D.value) as valueString:chararray;