您可以下载页面并解析 html 内容。下载页面应该很容易,举个例子:
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.HttpResponse;
import org.apache.http.HttpStatus;
String url = "http://www.whatever.com";
String html = "";
HttpClient httpclient = new DefaultHttpClient();
request = new HttpGet(new URI(url));
HttpResponse response = httpclient.execute(request);
StatusLine statusLine = response.getStatusLine();
if (statusLine.getStatusCode() == HttpStatus.SC_OK) {
ByteArrayOutputStream out = new ByteArrayOutputStream();
response.getEntity().writeTo(out);
out.close();
html = out.toString();
}
这不能在主线程上运行,因此您需要使用 asynctask 或 runnable 运行它。
对于解析,您可以使用JSoup。这里的问题不是使用 JSoup 本身,因为它很容易,而是分解文档结构。这里有 JSoup 的 helloworld:
String html = "<html><head></head><body><p>asdasd1</p><p>asdasd2</p></body></html>";
Document doc = Jsoup.parse(html);
Elements p = doc.select("p");
for(int i=0;i<p.size();i++){
Log.i("p",p.get(i).text());
}