1

我正在尝试在猪拉丁脚本中加载数据文件,数据有 2 列,但第 2 列中有一个文本限定符,示例数据如下:

DEVICE_ID,SUPPORTED_TECH
a2334,"GSM900,GSM1500,GSM200"
a54623,"GSM900,GSM1500"
a86646,"GSM1500,GSM200"

当我尝试加载如下日期时,第 2 列不被识别为 1 列

deviceList = load 'deviceList.csv' Using PigStorage(',') as (DEVICE_ID:chararray, SUPPORTED_TECH:chararray );

如何在加载数据集时定义文本限定符?

4

1 回答 1

1

试试这个,如果你需要不同的输出格式,请告诉我

输入.txt

DEVICE_ID,SUPPORTED_TECH
a2334,"GSM900,GSM1500,GSM200"
a54623,"GSM900,GSM1500"
a86646,"GSM1500,GSM200

猪脚本:

A = LOAD 'input.txt' AS line;
deviceList = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line,'^(\\w+),(.*)$')) as (DEVICE_ID:chararray, SUPPORTED_TECH:chararray );
DUMP deviceList;

输出:

(DEVICE_ID,SUPPORTED_TECH)
(a2334,"GSM900,GSM1500,GSM200")
(a54623,"GSM900,GSM1500")
(a86646,"GSM1500,GSM200")
于 2014-11-11T13:05:44.683 回答