0

Is there a way I can read each line from a logfile into its own field. I thought with ('\n') as a delimiter I should be able to achieve that.

File - test

Audit file /u01/app/oracle/admin/st01/adump/st011_ora_27063_1.aud
Node name:      test0041
CLIENT USER:[6] 'oracle'

So I would like to read this into three fields as

filename - Audit file /u01/app/oracle/admin/st01/adump/st011_ora_27063_1.aud
nodename - Node name:      test0041
username - CLIENT USER:[6] 'oracle'

I tried this but it didnt help.

A = LOAD 'test' using PigStorage ('\n') AS (filename, nodename, username);
4

2 回答 2

0

如果您的文件那么小,为什么不对文件进行预处理,例如在 LOAD 之前将 \n 转换为 \t ?

于 2013-03-15T06:20:51.843 回答
0

您不能使用 '\n' 作为 PigStorage 的分隔符。根据Pig10 文档

记录分隔符——对于加载语句,Pig 将换行符('\n')、回车符('\r' 或 CTRL-M)和组合的 CR + LF('\r\n')字符解释为记录分隔符(不要使用这些字符作为字段分隔符)。对于 store 语句,Pig 使用换行符 ('\n') 作为记录分隔符。

如果要解析日志文件,则必须编写自定义加载器。

于 2013-03-02T22:01:16.630 回答