4

I'm working on processing Tweets from Twitter and storing them in a database (MySQL).

I have my process running perfectly but sometimes I get an error like this one:

2012-08-31 08:11:23,303 WARN org.hibernate.engine.jdbc.spi.SqlExceptionHelper  - SQL Error: 1366, SQLState: HY000
2012-08-31 08:11:23,304 ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper  - Incorrect string value: '\xF0\x9F\x98\x9D #...' for column 'twe_text' at row 1

When looking for the problematic tweet in my logs I find the following one:

 2012-08-31 08:11:22,971 INFO com.myapp.TweetLoaderJob  - Text for tweet 241175722096480256: RT @totallytoyosi_: My go
odies, my goodies, not your goodies  <U+1F61D> #m&ms #sweeties #goodies #food  @ The Ritzy Cinema Café, Brixton htt ...

And, finally, looking what the hell is , I discovered that it is an emoticon that Twitter sends as-is

I have debugged, looking only for this specific tweet and my eclipse seems to not recognize this encoding character. So the question is, how can I handle this exception? I looked for configuring my MySQL database, but I cannot change the encoding (it's a requirement), so my option is to avoid managing this kind of tweets or supress this complicated character.

But how to do it, if Java does not recognize it?

4

1 回答 1

1

<U+[^>]+>在将字符串存储到数据库之前,您可以过滤字符串并删除不需要的部分(使用简单的正则表达式,例如)。

于 2012-08-31T11:21:50.033 回答