0

这是数据

123,456,789,q,w,e,r,20120513

123,77,88,8,jj,oo,"ooo,\r\n""d,\r\ndf,123",20120514

123,77,88,8,jj,oo,ooo,20120514

我想使用 pig 脚本将这些 \r\n 替换为换行符。

Pig Script:

    REGISTER file:///usr/share/pig/contrib/piggybank/java/piggybank.jar;

    DEFINE CSVLoader org.apache.pig.piggybank.storage.CSVLoader;

    RAW = LOAD '/home/bannie/test/test.log' 
            USING CSVLoader()  AS (
                a: chararray, 
                b: chararray, 
                c: chararray, 
                d: chararray, 
                e: chararray, 
                f: chararray, 
                g: chararray, 
                h: chararray
            );

    C = FOREACH RAW GENERATE REPLACE(g, '\\\\r\\\\n', '\uxxxx') as max;

grunt> C = FOREACH RAW GENERATE REPLACE(g, '\\\\r\\\\n', '\u000f') as max;
    grunt> C = FOREACH RAW GENERATE REPLACE(g, '\\\\r\\\\n', '\u000e') as max;
    grunt> C = FOREACH RAW GENERATE REPLACE(g, '\\\\r\\\\n', '\u000d') as max;
    2013-06-03 17:53:42,629 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 55, column 32>  mismatched input '(' expecting SEMI_COLON
    Details at logfile: /home/bannie/pig_1370249955149.log
    grunt> C = FOREACH RAW GENERATE REPLACE(g, '\\\\r\\\\n', '\u000a') as max;
    2013-06-03 17:53:47,601 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 55, column 32>  mismatched input '(' expecting SEMI_COLON
    Details at logfile: /home/bannie/pig_1370249955149.log
    grunt> C = FOREACH RAW GENERATE REPLACE(g, '\\\\r\\\\n', '\u000b') as max;
    grunt> C = FOREACH RAW GENERATE REPLACE(g, '\\\\r\\\\n', '\u000c') as max;

Anyone knows how to insert it?
4

1 回答 1

0

您不能向文件中插入任何内容。Pig 具有与 Hadoop Map Reduce 相同的限制。默认情况下,输出是一个包含某些部分的目录--文件。

于 2013-06-06T05:57:52.283 回答