2

Assuming that field time looks like 2013-01-01T00:00:00.000Z , piggybank.jar has been imported already , and command EXTRACT has been defined (DEFINE EXTRACT org.apache.pig.piggybank.evaluation.string.EXTRACT();) What's the best way to extract fields year, month, day, hour, minute, second ? That's what I have done so far:

data = FOREACH data GENERATE FLATTEN(EXTRACT(time, '(\\d+)-(\\d+)-(\\d+)T(\\d+):(\\d+):(\\d+).(\\s+)'))
        AS (
            year: int,
            month: int,
            day: int,
            hour: int,
            minute: int,
            second: int,
            tail: chararray
        );
4

1 回答 1

4

从 Pig 0.11 开始,您可以使用 DateTime 类型。

A = LOAD 'data' AS (date:chararray);
B = FOREACH A GENERATE ToDate(date) AS date;
C = FOREACH B GENERATE GetMonth(date) as month;

您可以在此处使用这些函数:DateTime 函数

如果您不使用 0.11,您可以编写 UDF 或使用您发布的正则表达式。

于 2013-04-18T14:47:40.213 回答