2

当我尝试加载多个文本格式的文件并使用 pig 将它们转换为 avro 格式时,我遇到了这个奇怪的问题。但是,如果我在分开的运行中一次读取和转换一个文件,一切都很好。错误消息如下

2012-08-21 19:15:32,964 [main] 错误 org.apache.pig.tools.grunt.GruntParser - 错误 2997:无法从支持的错误重新创建异常:org.apache.avro.file.DataFileWriter$AppendWriteException:java .lang.RuntimeException: Datum 1980-01-01 00:00:00.000 不在 org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:263) 的联合 ["null","long"] 中.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroRecordWriter.java:49) 在 org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.java:612) 在 org.apache.pig .backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139) 在 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter。write(PigOutputFormat.java:98) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:531) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48) 在 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBbackend.hadoop.executionengine.mapReduceLayer.PigGenericMapBbackend.hadoop.executionengine.mapReduceLayer.PigGenericMapB

我的代码是

set1 = load '$input_dir/set1.txt' using PigStorage('|') as (
   id:long,
   f1:long,
   f2:chararray,
   f3:float,
   f4:float,
   f5:float,
   f6:float,
   f7:float,
   f8:float,
   f9:float,
   f10:float,
   f11:float,
   f12:float);
store set1 into '$output_dir/set1.avro'
using org.apache.pig.piggybank.storage.avro.AvroStorage();

set2 = load '$input_dir/set2.txt' using PigStorage('|') as (
   id : int,
   date : chararray);
store set2 into '$output_dir/set2.avro'
using org.apache.pig.piggybank.storage.avro.AvroStorage();

第一个文件转换得很好,但第二个文件失败了。错误来自第二个文件中的第二个字段,但奇怪的是我的架构中甚至没有“long”,而错误消息显示 ["null","long"]。

我使用猪 0.10.0 和 avro-1.7.1.jar。

我想知道这是一个错误还是我错过了什么。

谢谢。担

这是set1.txt

827352|740214|Long|26|0.08731795012183759|1661335.541733333|0|0|0.001057865808239878|0.001059541098077884|0.001059541098077821|0.0514156486228232|0.001043980181757539
827353|740214|Short|12|-0.05967910581502997|-1135471.22271|0|0|-0.001185620143839061|-0.001187497751909232|-0.001187497751909183|-0.0747641932858414|-0.0001307449002148424
827354|740214|Total|38|0.02763884430680765|19026277.40819863|0|0|-0.0001277543355991829|-0.0001279566538313473|-0.0001279566538313626|-0.02334854466301821|0.0009132352815426966
827193|739576|Long|26|0.08731795012183759|1661335.541733333|0|0|0.001057865808239878|0.001059541098077884|0.001059541098077821|0.0514156486228232|0.001043980181757539
827194|739576|Short|12|-0.05967910581502997|-1135471.22271|0|0|-0.001185620143839061|-0.001187497751909232|-0.001187497751909183|-0.0747641932858414|-0.0001307449002148424
827195|739576|Total|38|0.02763884430680765|19026277.40819863|0|0|-0.0001277543355991829|-0.0001279566538313473|-0.0001279566538313626|-0.02334854466301821|0.0009132352815426966
827355|740215|Long|51|1.776868012839072|113652088.7063555|0|0|0.01952547658695701|0.0195703176808393|0.01957031768083928|1.164818333642054|0
827356|740215|Short|34|-2.360589090333165|-150988074.9471841|0|0|-0.00868330219442376|-0.008616238065508337|-0.008616238065508375|-0.5943698959308671|-0.02690679230502523
827357|740215|Total|85|-0.5837210774940929|63962032.00527128|0|0|0.01084217439253325|0.01095407961533095|0.0109540796153309|0.5704484377111866|-0.02690679230502523
827202|739590|Long|53|1.777568428360522|113696888.7063555|0|0|0.01952547658695701|0.0195703176808393|0.01957031768083928|1.156653489849146|0

这是 set2.txt

1|1980-01-01 00:00:00.000
2|1980-01-02 00:00:00.000
3|1980-01-03 00:00:00.000
4|1980-01-04 00:00:00.000
5|1980-01-07 00:00:00.000
6|1980-01-08 00:00:00.000
7|1980-01-09 00:00:00.000
8|1980-01-10 00:00:00.000
9|1980-01-11 00:00:00.000
10|1980-01-14 00:00:00.000
4

1 回答 1

0

看来猪需要先加载,请尝试

set1 = load '$input_dir/set1.txt' using PigStorage('|');
set1 = load '$input_dir/set1.txt' using PigStorage('|');

--other logic
于 2012-08-23T08:45:10.097 回答