android - 从网站中提取特定文本块到 Android 应用程序

Question

我正在开发一个必须从网站提取数据的 Android 应用程序，提取的数据将显示在应用程序的文本视图中

在尝试了我在谷歌搜索和 Stackoverflow 中找到的所有可能方法之后，我仍然无法处理数据，现在任何人都可以分享，如果他们已经完成了..

详情网站：https ://www.amrita.edu/campus/bengaluru

在这个网站上，我希望提取最新新闻块和即将发生的事件的数据

这是代码：我使用 JSOUP 提取

package out.in;
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.select.Elements;
import org.w3c.dom.Document;
import android.app.Activity;
import android.os.Bundle;
import android.sax.Element;
import android.widget.TextView;
import android.widget.Toast;

  public class HtmlExtracterActivity extends Activity {
/** Called when the activity is first created. */


//  url
   static final String URL = "https://www.amrita.edu/campus/bengaluru";
@Override
    public void onCreate(Bundle savedInstanceState) {
      super.onCreate(savedInstanceState);
      setContentView(R.layout.main);


    try {
        ((TextView)findViewById(R.id.tv)).setText(getdata());
    } 
    catch (Exception ex) {

        ((TextView)findViewById(R.id.tv)).setText("Error");

    }  

 }



  protected String getdata() throws Exception {
        String result = "";
        // get html document structure
        Document document = (Document) Jsoup.connect(URL).get();


        // selector query
       *********Need help 
        // check results
        *********Need help
        return result;
    }

}

我已在清单文件中授予 Internet 权限，并且

xml文件如下

<?xml version="1.0" encoding="utf-8"?>
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
android:orientation="vertical"
android:layout_width="fill_parent"
android:layout_height="fill_parent"
>

<TextView android:text=" "

android:id="@+id/tv" android:layout_width="wrap_content"
 android:layout_height="wrap_content"></TextView>
 </LinearLayout>

我真诚地提前感谢所需的帮助

score 0 · Accepted Answer

你没有提到你面临的确切问题。您是否尝试查看在此返回的内容：

Document document = (Document) Jsoup.connect(URL).get();

我假设这可能是因为上述代码中缺少 User-Agent。请试试这个，如果您仍然遇到错误，请告诉我们：

响应 response= Jsoup.connect(location) .ignoreContentType(true) .userAgent("Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 Firefox/25.0")
.referrer(" http:// www.google.com ")
.timeout(12000) .followRedirects(true) .execute();

文档 doc = response.parse(); 用户代理

使用最新的用户代理。这是完整列表 http://www.useragentstring.com/pages/Firefox/。

暂停

也不要忘记添加超时，因为有时下载页面需要超过正常的超时时间。

推荐人

将referer设置为google。

关注重定向

按照重定向到达该页面。

执行（）而不是获取（）

使用 execute() 获取 Response 对象。这可以帮助您检查内容类型和状态代码以防出现错误。

来源：https ://stackoverflow.com/a/20284953/1262177

android - 从网站中提取特定文本块到 Android 应用程序

1 回答 1

Related

Reference