I have recently switched to Hbase from rdbms for handling millions of records.. But as a newbie I am not sure what is the efficient way of designing Hbase scheme. Actually, scenario is I have text files which have hundred, thousands and millions records that I have to read and store into Hbase. So, there are two set of text files(RawData File, Label File) which are linked to each other as they belong to same user, for these files I have made two separate tables(RawData and Label) and I am storing their info there. So RawData file and RawData table look like this:
So you can see in my RawData table I have row key which is actually a file name of text file( 01-01-All-Data.txt) with the row number of each row of textfile. And column family is just random 'r' and column qualifiers are the columns of text files and value are the values of column. This is how I am inserting record in my table and I have third table(MapFile) where I store name of textfile as a row key user id of user as column qualifier and total number of records of textfile as value which looks like this:
01-01-All-Data.txt column=m:1, timestamp=1375189274467, value=146209
I will use Mapfile table in order to read RawData table row by row..
What is your suggestion about this kind Hbase Schema? Is it a proper way? or it doesn't make sense in Hbase concepts?
Furthermore, It worths to mention that it is taking around 3 mins in inserting 21 mbs file with 146207 rows in Hbase.
Please Advice.
Thanks