apache-pig - Pig - 更新列时保留架构

Question

我想使用一个函数来更新关系中的列。我已经弄清楚如何使用更新的数据添加新列并删除旧列，但新列不包含我想保留的字段名。

例如，说students.txt是：

John    18      4.0
Mary    19      3.8
Bill    20      3.9
Joe     18      3.8

在猪：

x = load 'students.txt' as (name:chararray, age:int, gpa:float);

dump x
(John,18,4.0)
(Mary,19,3.8)
(Bill,20,3.9)
(Joe,18,3.8)

describe x
x: {name: chararray,age: int,gpa: float}


y = foreach x generate name, (age==18?999:age), gpa;

dump y;
(John,999,4.0)
(Mary,19,3.8)
(Bill,20,3.9)
(Joe,999,3.8)

describe y;
y: {name: chararray,int,gpa: float}

如何保留age第二个字段的名称，使其y具有相同的架构x？

此外，是否有一种简单的方法可以保留数据集中的每一列，除了旧版本的这一列？（即忽略一个字段的星号表达式或项目范围表达式）。

还是有更好的方法来解决这个问题？

score 0 · Accepted Answer

我找到了一个快速的方法来做到这一点。关键是as [field name]在函数之后使用。

y = foreach x generate name, (age==18?999:age) as age, gpa;

dump y
(John,999,4.0)
(Mary,19,3.8)
(Bill,20,3.9)
(Joe,999,3.8)

describe y
y: {name: chararray,age: int,gpa: float}

apache-pig - Pig - 更新列时保留架构

1 回答 1

Related

Reference