我正在为我的网站 http://www.werchelsea.com/ 开发一个新闻阅读器应用程序,该应用程序从提要中获取最新消息: http : //www.werchelsea.com/feed/atom/我成功获得提要正确并将其转换为字符串。现在我的主要问题是我的提要描述包含带有 html 标签的数据,例如:
<p>It was Raul Meireles who came from the Merseyside to London to complete his move from Liverpool to Chelsea on the dead line day of the summer transfer window last year, when Chelsea failed to sign the highly-rated midfielder, Luka Modric. Chelsea were left with no other choice but to sign the Portuguese midfielder.</p>
<p>Meireles was a regular starter under the management of Villas-Boas, he really enjoyed working under
<a href='http://www.werchelsea.com/2012/09/05/time-to-say-good-bye-to-raul-meireles/303777_153113331443746_1122718871_n/' title='303777_153113331443746_1122718871_n'><img width="150" height="150" src="http://www.werchelsea.com/wp-content/uploads/2012/09/303777_153113331443746_1122718871_n-150x150.jpg" class="attachment-thumbnail" alt="Meireles first training session with Chelsea football club" title="303777_153113331443746_1122718871_n" /></a>
我尝试用正则表达式替换所有这些标签,但由于某种原因,我无法找到正确的 RE 来匹配所有 html 标签类型。我用来代替的是:
protected String doInBackground(String... arg0) {
    String response="";
    try{
     URL feedwebsite=new URL(feedURL);
     SAXParserFactory spf=SAXParserFactory.newInstance();
     SAXParser sp = spf.newSAXParser();
     XMLHandler feedHandler=new XMLHandler();
     XMLReader feedReader=sp.getXMLReader();
     feedReader.setContentHandler(feedHandler);
     InputSource is=new InputSource(feedwebsite.openStream());
     feedReader.parse(is);
     response=feedHandler.getParsedFeed().replaceAll("<"+"[0-9a-zA-Z]+"+">","_").replaceAll("</"+"[0-9a-zA-Z]+"+">","-").replaceAll("<"+"[0-9a-zA-Z]+"+"/>",".");  
    }
    catch (Exception e)
    {
        response="Cannot Connect to the server.Please Check your Wifi/Data   Connection.";
        e.printStackTrace();
    }
    return response;
}***
如果使用 RE 替换字符串是正确的程序,或者还有其他方法,请帮助我。