0

Users, greetings !

I have installed a flume on my cloudera 4.6, and I am trying to get tweets from twitter.

So I created a HDFS sink and a HBase sink, and they are gathering tweets... But data in HBase is not well structured.

As the data is not structured, I can't make queries on it with impala.

I created a table tweets {NAME => 'tweet'}, {NAME => 'retweet'}, {NAME => 'entities'}, {NAME => 'user'}

and my flume configuration is : http://pastebin.com/4b5d3R8Q

I am following this tutorial, but I don't know what to do with his serializer.

https://github.com/AronMacDonald/Twitter_Hbase_Impala I have to make it into a jar ?

I have currently this in Hbase: http://pastebin.com/aNGBsvB7 Everything is in the column tweets...

4

1 回答 1

0

I recompiled and used the flume-sources-1.0-SNAPSHOT.jar from the git:https://github.com/cloudera/cdh-twitter-example and so there were no promblem when using 'TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource'

Install Maven, then download the repository of cdh-twitter-example.

Unzip, then execute inside (as mentionned) :

$ cd flume-sources

$ mvn package

$ cd ..

This problem happened when the twitter4j version updated from 2.2.6 to 3.X, they removed the method setIncludeEntities, and the JAR is not up to date.

PS: Do not download the prebuilt version, it is still the old.

于 2014-07-21T12:05:36.810 回答