我很难弄清楚 hadoop 中的序列化问题。这是我的课-
public class CrawlerTweet implements Serializable, Writable {
private static final long serialVersionUID = 1L;
private String keywords;
private List<TweetStatus> tweets;
private long queryTime = 0;
public CrawlerTweet() {}
public CrawlerTweet(String keys, List<TweetStatus> tweets, long queryTime){
this.keywords = keys;
this.tweets = tweets;
this.queryTime = queryTime;
}
public static CrawlerTweet read(DataInput in) throws IOException {
CrawlerTweet ts= new CrawlerTweet();
ts.readFields(in);
return ts;
}
@Override
public void readFields(DataInput din) throws IOException {
queryTime = din.readLong();
keywords = din.readLine();
tweets.clear();
IntWritable size = new IntWritable();
size.readFields(din);
int n = size.get();
while(n -- > 0) {
TweetStatus ts = new TweetStatus();
ts.readFields(din);
tweets.add(ts);
}
}
@Override
public void write(DataOutput dout) throws IOException {
dout.writeChars(keywords);
dout.writeLong(queryTime);
IntWritable size = new IntWritable(tweets.size());
size.write(dout);
for (TweetStatus ts : tweets)
ts.write(dout);
}
public String getKeywords(){
return keywords;
}
public List<TweetStatus> getTweets(){
return tweets;
}
public long getQueryTime(){
return queryTime;
}
}
如果我实现了两个 Serizable n WRitable 接口,我会得到以下异常,
java.lang.ClassNotFoundException: mydat.twitter.dto.CrawlerTweet
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:249)
at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:601)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1572)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1493)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1729)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
at focusedCrawler.util.storage.socket.ServerConnectionHandler.buildRequestObject(ServerConnectionHandler.java:136)
at focusedCrawler.util.storage.socket.ServerConnectionHandler.run(ServerConnectionHandler.java:340)
而且,如果我只实现 Writable,我会得到 NotSerializableException -
Erro de comunicacao: bigdat.twitter.dto.CrawlerTweet
Dormindo 5 mls
`[21/JUN/2013:11:23:39] [SocketAdapterFactory] [produce] [hadoop22:3190]
Erro de comunicacao: bigdat.twitter.dto.CrawlerTweet
java.io.NotSerializableException: bigdat.twitter.dto.CrawlerTweet
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1164)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:330)
at focusedCrawler.util.storage.socket.StorageRemoteAdapter.serializeParamObject(StorageRemoteAdapter.java:113)
at focusedCrawler.util.storage.socket.StorageRemoteAdapter.defaultMethod(StorageRemoteAdapter.java:205)
at focusedCrawler.util.storage.socket.StorageRemoteAdapter.insert(StorageRemoteAdapter.java:289)
at focusedCrawler.util.storage.distribution.StorageRemoteAdapterReconnect.insert(StorageRemoteAdapterReconnect.java:213)
at bigdat.twitter.crawler.CrawlTwitter.download(Unknown Source)
at bigdat.twitter.crawler.CrawlTwitter.run(Unknown Source)
从评论中提取的更多信息:
CrawlerTweet
打包在 BDAnalytics16.jar 文件中。
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/home/rgupta/bdAnalytics/lib/*
hadoop jar $jarpath/BDAnalytics16.jar bigdat.twitter.crawler.CrawlTwitter \
$crwlInputFile > $logsFldr/crawler_$1.log 2>&1 &
帮助将不胜感激!谢谢!