hadoop - Getting data into Hadoop

Question

I come from a lot of SQL servers so it can be a bit difficult to picture exactly what happens to data when it goes into hadoop.

My understanding is that if you have a book in a text format that could be around 200k or so... you simply copy the data into hadoop and it becomes searchable. However does this data become part of a block so that HDFS can be more optimal or does it remain a 200k file in HDFS hurting performance?

Also is a Block what is often called a Tablet in Bigtable?

Thanks a lot for your help. FlyMario

score 0 · Accepted Answer

A file which is less than the block size of HDFS (default 64 megabytes) becomes part of a block, yes. But small files such as these might still hurt your performance in some cases, such as if you have a lot of these small files and you run a MapReduce job on them.

Vanilla Hadoop has nothing to do with Bigtable, and HDFS blocks aren't really comparable with tablets. While Hadoop's HDFS blocks have no knowledge of the data they're holding, Bigtable tablets are data-aware.

hadoop - Getting data into Hadoop

1 回答 1

Related

Reference