27

我正在开发一个应用程序,该应用程序需要从链接中获取网页的源代码,然后解析该页面中的 html。

你能给我一些例子,或者从哪里开始编写这样的应用程序?

4

8 回答 8

47

您可以使用HttpClient执行 HTTP GET 并检索 HTML 响应,如下所示:

HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet(url);
HttpResponse response = client.execute(request);

String html = "";
InputStream in = response.getEntity().getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder str = new StringBuilder();
String line = null;
while((line = reader.readLine()) != null)
{
    str.append(line);
}
in.close();
html = str.toString();
于 2010-03-11T14:36:54.627 回答
17

我建议jsoup

根据他们的网站:

获取 Wikipedia 主页,将其解析为 DOM,然后从 In the news 部分中将标题选择到 Elements 列表中(在线示例):

Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
Elements newsHeadlines = doc.select("#mp-itn b a");

入门:

  1. 下载jsoup jar核心库
  2. 阅读食谱介绍
于 2013-09-26T14:33:21.793 回答
14

这个问题有点老了,但我想我应该发布我的答案,因为DefaultHttpClient,HttpGet等已被弃用。这个函数应该获取并返回 HTML,给定一个 URL。

public static String getHtml(String url) throws IOException {
    // Build and set timeout values for the request.
    URLConnection connection = (new URL(url)).openConnection();
    connection.setConnectTimeout(5000);
    connection.setReadTimeout(5000);
    connection.connect();

    // Read and store the result line by line then return the entire string.
    InputStream in = connection.getInputStream();
    BufferedReader reader = new BufferedReader(new InputStreamReader(in));
    StringBuilder html = new StringBuilder();
    for (String line; (line = reader.readLine()) != null; ) {
        html.append(line);
    }
    in.close();

    return html.toString();
}
于 2015-07-15T02:54:49.067 回答
6
public class RetrieveSiteData extends AsyncTask<String, Void, String> {
@Override
protected String doInBackground(String... urls) {
    StringBuilder builder = new StringBuilder(100000);

    for (String url : urls) {
        DefaultHttpClient client = new DefaultHttpClient();
        HttpGet httpGet = new HttpGet(url);
        try {
            HttpResponse execute = client.execute(httpGet);
            InputStream content = execute.getEntity().getContent();

            BufferedReader buffer = new BufferedReader(new InputStreamReader(content));
            String s = "";
            while ((s = buffer.readLine()) != null) {
                builder.append(s);
            }

        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    return builder.toString();
}

@Override
protected void onPostExecute(String result) {

}
}
于 2013-01-15T17:54:46.600 回答
1

像这样称呼它

new RetrieveFeedTask(new OnTaskFinished()
        {
            @Override
            public void onFeedRetrieved(String feeds)
            {
                //do whatever you want to do with the feeds
            }
        }).execute("http://enterurlhere.com");

RetrieveFeedTask.class

class RetrieveFeedTask extends AsyncTask<String, Void, String>
{
    String HTML_response= "";

    OnTaskFinished onOurTaskFinished;


    public RetrieveFeedTask(OnTaskFinished onTaskFinished)
    {
        onOurTaskFinished = onTaskFinished;
    }
    @Override
    protected void onPreExecute()
    {
        super.onPreExecute();
    }

    @Override
    protected String doInBackground(String... urls)
    {
        try
        {
            URL url = new URL(urls[0]); // enter your url here which to download

            URLConnection conn = url.openConnection();

            // open the stream and put it into BufferedReader
            BufferedReader br = new BufferedReader(new InputStreamReader(conn.getInputStream()));

            String inputLine;

            while ((inputLine = br.readLine()) != null)
            {
                // System.out.println(inputLine);
                HTML_response += inputLine;
            }
            br.close();

            System.out.println("Done");

        }
        catch (MalformedURLException e)
        {
            e.printStackTrace();
        }
        catch (IOException e)
        {
            e.printStackTrace();
        }
        return HTML_response;
    }

    @Override
    protected void onPostExecute(String feed)
    {
        onOurTaskFinished.onFeedRetrieved(feed);
    }
}

OnTaskFinished.java

public interface OnTaskFinished
{
    public void onFeedRetrieved(String feeds);
}
于 2014-05-12T07:31:41.113 回答
0

如果你看看这里这里,你会发现你不能直接用 android API 来做,你需要一个外部库......

如果您需要外部库,您可以在上面的 2 中进行选择。

于 2010-03-11T09:06:06.727 回答
0

另一个SO帖子答案之一对我有帮助。这不是逐行读取的;假设 html 文件之间有一行null。作为 preRequisite 从项目设置“com.koushikdutta.ion:ion:2.2.1”添加此依赖项在AsyncTASK中实现此代码。如果您希望返回的-something-在 UI 线程中,请将其传递给相互接口。

Ion.with(getApplicationContext()).
load("https://google.com/hashbrowns")
.asString()
.setCallback(new FutureCallback<String>()
 {
        @Override
        public void onCompleted(Exception e, String result) {
            //int s = result.lastIndexOf("user_id")+9;
            // String st = result.substring(s,s+5);
           // Log.e("USERID",st); //something

        }
    });
于 2018-05-15T07:09:08.710 回答
0
public class DownloadTask extends AsyncTask<String, Void, String> {

        @Override
        protected String doInBackground(String... urls) {

            String result = "";
            URL url;
            HttpsURLConnection urlConnection = null;

            try {
                url = new URL(urls[0]);

                urlConnection = (HttpsURLConnection) url.openConnection();

                BufferedReader br = new BufferedReader(new InputStreamReader(urlConnection.getInputStream()));


                String inputLine;

                while ((inputLine = br.readLine()) != null)
                {
                    // System.out.println(inputLine);
                    result += inputLine;
                }
                br.close();
                return result;
            } catch (Exception e) {
                e.printStackTrace();
                return "failed";
            }
        }
    }

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        DownloadTask task = new DownloadTask();

        String result = null;

        try {
            result = task.execute("https://www.example.com").get();
        }catch (Exception e){

            e.printStackTrace();
        }
        Log.i("Result", result);

    }
于 2020-04-14T16:54:26.317 回答