Users, greetings !
I have installed a flume on my cloudera 4.6, and I am trying to get tweets from twitter.
So I created a HDFS sink and a HBase sink, and they are gathering tweets... But data in HBase is not well structured.
As the data is not structured, I can't make queries on it with impala.
I created a table tweets {NAME => 'tweet'}, {NAME => 'retweet'}, {NAME => 'entities'}, {NAME => 'user'}
and my flume configuration is : http://pastebin.com/4b5d3R8Q
I am following this tutorial, but I don't know what to do with his serializer.
https://github.com/AronMacDonald/Twitter_Hbase_Impala I have to make it into a jar ?
I have currently this in Hbase: http://pastebin.com/aNGBsvB7 Everything is in the column tweets...