I'm using hdfs -put to load a large 20GB file into hdfs. Currently the process runs @ 4mins. I'm trying to improve the write time of loading data into hdfs. I tried utilizing different block sizes to improve write speed but got the below results:
512M blocksize = 4mins;
256M blocksize = 4mins;
128M blocksize = 4mins;
64M blocksize = 4mins;
Does anyone know what the bottleneck could be and other options I could explore to improve performance of the -put cmd?