I must read Avro record serialized in avro files in HDFS. To do that, I use the AvroKeyInputFormat, so my mapper is able to work with the read records as keys.
My question is, how can I control the split size? With the text input format it consists on define the size in bytes. Here I need to define how many records every split will consist of.
I would like to manage every file in my input directory like a one big file. Have I to use CombineFileInputFormat? Is it possible to use it with Avro?