2

我正在尝试从 RSS 提要中检索数据。我的程序运行良好,但有一个例外。提要的项目结构如下:

<title></title>
<link></link>
<description></description>

我可以检索数据,但是当标题包含“&”字符时,返回的字符串会停在之前的字符处。例如,这个标题:

<title>A&amp;T To Play Four Against Bears</title>

我只得到一个'A',当我期望得到'A&T To Play Four Against Bears'时。

谁能告诉我是否可以修改我现有的 RSSReader 类以解决 & 的存在:

import android.util.Log;

导入 java.net.URL;导入 java.util.ArrayList;导入 javax.xml.parsers.DocumentBuilder;导入 javax.xml.parsers.DocumentBuilderFactory;导入 org.w3c.dom.CharacterData;导入 org.w3c.dom.Document;导入 org.w3c.dom.Element;导入 org.w3c.dom.Node;导入 org.w3c.dom.NodeList;

公共类 RSSReader {

private static RSSReader instance = null;

private RSSReader() {
}

public static RSSReader getInstance() {
    if (instance == null) {
        instance = new RSSReader();
    }
    return instance;
}

public ArrayList<Story> getStories(String address) {
    ArrayList<Story> stories = new ArrayList<Story>();
    try {
        DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
        URL u = new URL(address);
        Document doc = builder.parse(u.openStream());
        NodeList nodes = doc.getElementsByTagName("item");
        for (int i = 0; i < nodes.getLength(); i++) {
            Element element = (Element) nodes.item(i);
            Story currentStory = new Story(getElementValue(element, "title"),
                    getElementValue(element, "description"),
                    getElementValue(element, "link"),
                    getElementValue(element, "pubDate"));
            stories.add(currentStory);
        }//for
    }//try
    catch (Exception ex) {
        if (ex instanceof java.net.ConnectException) {
        }
    }
    return stories;
}

private String getCharacterDataFromElement(Element e) {
    try {
        Node child = e.getFirstChild();
        if (child instanceof CharacterData) {
            CharacterData cd = (CharacterData) child;
            return cd.getData();
        }
    } catch (Exception ex) {
        Log.i("myTag2", ex.toString());
    }
    return "";
} //private String getCharacterDataFromElement

protected float getFloat(String value) {
    if (value != null && !value.equals("")) {
        return Float.parseFloat(value);
    } else {
        return 0;
    }
}

protected String getElementValue(Element parent, String label) {
    return getCharacterDataFromElement((Element) parent.getElementsByTagName(label).item(0));
}

}

关于如何解决这个问题的任何想法?

4

2 回答 2

1

我用我使用的解析器测试了 rss 提要,它的解析如下所示。似乎它是可解析的,但正如我在评论中所写的那样,由于 CDATA 被使用并且也被转义,所以有像“A&T”这样的文本,但你可以在解析 xml 后替换它们。

D/*** TITLE      : A&T To Play Four Against Longwood
D/*** DESCRIPTION: A&amp;T baseball takes a break from conference play this weekend.
D/*** TITLE      : Wilkerson Named MEAC Rookie of the Week
D/*** DESCRIPTION: Wilkerson was 6-for-14 for the week of April 9-15.
D/*** TITLE      : Lights, Camera, Action
D/*** DESCRIPTION: A&amp;T baseball set to play nationally televised game on ESPNU.
D/*** TITLE      : Resilient Aggies Fall To USC Upstate
D/*** DESCRIPTION: Luke Tendler extends his hitting streak to 10 games.
D/*** TITLE      : NCCU Defeats A&T In Key Conference Matchup
D/*** DESCRIPTION: Kelvin Freeman leads the Aggies with three hits.

我正在分享我用来比较与你的不同之处的大部分 rss 提要解析器。

XmlPullFeedParser.java

package com.nesim.test.rssparser;

import java.util.ArrayList;
import java.util.List;

import org.xmlpull.v1.XmlPullParser;

import android.util.Log;
import android.util.Xml;

public class XmlPullFeedParser extends BaseFeedParser {

  public XmlPullFeedParser(String feedUrl) {
    super(feedUrl);
  }

  public List<Message> parse() {
    List<Message> messages = null;
    XmlPullParser parser = Xml.newPullParser();
    try {
      // auto-detect the encoding from the stream
      parser.setInput(this.getInputStream(), null);
      int eventType = parser.getEventType();
      Message currentMessage = null;
      boolean done = false;
      while (eventType != XmlPullParser.END_DOCUMENT && !done){
        String name = null;
        switch (eventType){
          case XmlPullParser.START_DOCUMENT:
            messages = new ArrayList<Message>();
            break;
          case XmlPullParser.START_TAG:
            name = parser.getName();
            if (name.equalsIgnoreCase(ITEM)){
              currentMessage = new Message();
            } else if (currentMessage != null){
              if (name.equalsIgnoreCase(LINK)){
                currentMessage.setLink(parser.nextText());
              } else if (name.equalsIgnoreCase(DESCRIPTION)){
                currentMessage.setDescription(parser.nextText());
              } else if (name.equalsIgnoreCase(PUB_DATE)){
                currentMessage.setDate(parser.nextText());
              } else if (name.equalsIgnoreCase(TITLE)){
                currentMessage.setTitle(parser.nextText());
              } else if (name.equalsIgnoreCase(DATES)){
                currentMessage.setDates(parser.nextText());
              } 
            }
            break;
          case XmlPullParser.END_TAG:
            name = parser.getName();
            if (name.equalsIgnoreCase(ITEM) && currentMessage != null){
              messages.add(currentMessage);
            } else if (name.equalsIgnoreCase(CHANNEL)){
              done = true;
            }
            break;
        }
        eventType = parser.next();
      }
    } catch (Exception e) {
      Log.e("AndroidNews::PullFeedParser", e.getMessage(), e);
      throw new RuntimeException(e);
    }
    return messages;
  }
}

BaseFeedParser.java

package com.nesim.test.rssparser;

import java.io.IOException;
import java.io.InputStream;
import java.net.MalformedURLException;
import java.net.URL;

public abstract class BaseFeedParser implements FeedParser {

  // names of the XML tags
  static final String CHANNEL = "channel";
  static final String PUB_DATE = "pubDate";
  static final  String DESCRIPTION = "description";
  static final  String LINK = "link";
  static final  String TITLE = "title";
  static final  String ITEM = "item";
  static final  String DATES = "dates";
  private final URL feedUrl;

  protected BaseFeedParser(String feedUrl){
    try {
      this.feedUrl = new URL(feedUrl);
    } catch (MalformedURLException e) {
      throw new RuntimeException(e);
    }
  }

  protected InputStream getInputStream() {
    try {
      return feedUrl.openConnection().getInputStream();
    } catch (IOException e) {
      throw new RuntimeException(e);
    }
  }
}

FeedParser.java

package com.nesim.test.rssparser;

import java.util.List;

public interface FeedParser {
  List<Message> parse();
}
于 2012-04-21T18:45:11.410 回答
0

似乎您没有按照我提供的那样更改代码。如果您坚持要这样解析,则需要先获取 xml 并对其进行操作以进行正确解析。我还在此消息末尾提供了一个类以获取 xml 作为文本。请像这样更改您的代码,尝试编写结果。

如果您更改此行,您将成功。

从 getStories 函数中删除此行:

DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
URL u = new URL(address);
Document doc = builder.parse(u.openStream());

而不是那些删除的行,添加这些:

WebRequest response = new WebRequest("http://www.ncataggies.com/rss.dbml?db_oem_id=24500&RSS_SPORT_ID=74515&media=news",PostType.GET);
String htmltext = response.Get();

int firtItemIndex = htmltext.indexOf("<item>");
String htmltextHeader = htmltext.substring(0,firtItemIndex);
String htmltextBody = htmltext.substring(firtItemIndex);

htmltextBody = htmltextBody.replace("<title>", "<title><![CDATA[ ");
htmltextBody = htmltextBody.replace("</title>", "]]></title>");

htmltextBody = htmltextBody.replace("<link>", "<link><![CDATA[ ");
htmltextBody = htmltextBody.replace("</link>", "]]></link>");

htmltextBody = htmltextBody.replace("<guid>", "<guid><![CDATA[ ");
htmltextBody = htmltextBody.replace("</guid>", "]]></guid>");
htmltextBody = htmltextBody.replace("&amp;", "&");
htmltext = htmltextHeader + htmltextBody;

Document doc = XMLfunctions.XMLfromString(htmltext);

WebRequest.java

package com.nesim.test;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.UnknownHostException;
import java.nio.charset.Charset;

import org.apache.http.HttpResponse;
import org.apache.http.client.CookieStore;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.client.protocol.ClientContext;
import org.apache.http.impl.client.BasicCookieStore;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.protocol.BasicHttpContext;
import org.apache.http.protocol.HttpContext;


public class WebRequest {
  public enum PostType{
    GET, POST;
  }

  public String _url;
  public String response = "";
  public PostType _postType;
  CookieStore _cookieStore = new BasicCookieStore();

  public WebRequest(String url) {
    _url = url;
    _postType = PostType.POST;
  }

  public WebRequest(String url, CookieStore cookieStore) {
    _url = url;
    _cookieStore = cookieStore;
    _postType = PostType.POST;
  }

  public WebRequest(String url, PostType postType) {
    _url = url;
    _postType = postType;
  }

  public String Get() {
    HttpClient httpclient = new DefaultHttpClient();

    try {




      // Create local HTTP context
      HttpContext localContext = new BasicHttpContext();

      // Bind custom cookie store to the local context
      localContext.setAttribute(ClientContext.COOKIE_STORE, _cookieStore);

      HttpResponse httpresponse;
      if (_postType == PostType.POST)
      {
        HttpPost httppost = new HttpPost(_url);
        httpresponse = httpclient.execute(httppost, localContext);
      }
      else
      {
        HttpGet httpget = new HttpGet(_url);
        httpresponse = httpclient.execute(httpget, localContext);
      }

      StringBuilder responseString = inputStreamToString(httpresponse.getEntity().getContent());

      response = responseString.toString();
    }
    catch (UnknownHostException e) {
      e.printStackTrace();
    }
    catch (Exception e) {
      e.printStackTrace();
    }
    finally {
      // When HttpClient instance is no longer needed,
      // shut down the connection manager to ensure
      // immediate deallocation of all system resources
      httpclient.getConnectionManager().shutdown();
    }

    return response;
  }

  private StringBuilder inputStreamToString(InputStream is) throws IOException {
    String line = "";
    StringBuilder total = new StringBuilder();

    // Wrap a BufferedReader around the InputStream
    BufferedReader rd = new BufferedReader(new InputStreamReader(is,Charset.forName("iso-8859-9")));
    // Read response until the end
    while ((line = rd.readLine()) != null) {
      total.append(line);
    }

    // Return full string
    return total;
  }
}

重要的:

不要忘记在 WebRequest.java 的第一行更改包名称

包 com.nesim.test;

结果:

进行这些更改后,您将获得以下内容:

D/title:  Two Walk-Off Moments Lead To Two A&T Losses
D/description: The Lancers win in their last at-bat in both games of Saturday&#39;s doubleheader.
D/title:  A&T To Play Four Against Longwood
D/description: A&T baseball takes a break from conference play this weekend.
D/title:  Wilkerson Named MEAC Rookie of the Week
D/description: Wilkerson was 6-for-14 for the week of April 9-15.
D/title:  Lights, Camera, Action
D/description: A&T baseball set to play nationally televised game on ESPNU.
D/title:  Resilient Aggies Fall To USC Upstate
D/description: Luke Tendler extends his hitting streak to 10 games.

您的解析返回这些:

D/title  : Two Walk-Off Moments Lead To Two A
D/description: The Lancers win in their last at-bat in both games of Saturday&#39;s doubleheader.
D/title  : A
D/description: A&amp;T baseball takes a break from conference play thisweekend.
D/title  : Wilkerson Named MEAC Rookie of the Week
D/description: Wilkerson was 6-for-14 for the week of April 9-15.
D/title  : Lights, Camera, Action
D/description: A&amp;T baseball set to play nationally televised game on ESPNU.
D/title  : Resilient Aggies Fall To USC Upstate
D/description: Luke Tendler extends his hitting streak to 10 games.
于 2012-04-22T22:22:40.777 回答