java - 如何使用 xpath 从特定站点获取值并将其存储在数组中？（安卓）

Question

我想问一下我的代码有什么问题。我想从 html 页面获取结果并将值存储在字符串中或稍后存储在数组中......谢谢

09-05 16:36:41.221: I/test(22697): 计划失败 1org.xml.sax.SAXParseException: attr 值分隔符丢失！（位置：START_TAG @1:166 in java.io.StringReader@4061bc98）09-05 16:36:41.221：I/test（22697）：计划失败 1a @1:166 in java.io.StringReader@4061bc98）09 -05 16:36:41.231: W/System.err(22697): 在 org.apache.harmony.xml.parsers.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:151) 09-05 16:36:41.231: W/System .err(22697): 在 com.asiatype.boracay.CurrencyActivity$DownloadData.doInBackground(CurrencyActivity.java:194) 09-05 16:36:41.231: W/System.err(22697): 在 com.asiatype.boracay。 CurrencyActivity$DownloadData.doInBackground(CurrencyActivity.java:1) 09-05 16:36:41.231: W/System.err(22697): 在 android.os.AsyncTask$2.call(AsyncTask.java:185) 09-05 16 ：36：41.231：W / System.err（22697）：在java.util.concurrent。

            String s,link;
        String theResult = "";
        link="http://www.bsp.gov.ph/statistics/sdds/exchrate.htm";
        Document doc;
        HttpClient client = new DefaultHttpClient();
        HttpGet request = new HttpGet(link);
        HttpResponse response;
        try {
            response = client.execute(request);
            InputStream in = response.getEntity().getContent();
            BufferedReader reader = new BufferedReader(new InputStreamReader(in));
            StringBuilder str = new StringBuilder();
            String line = null;
            while((line = reader.readLine()) != null)
            {
                str.append(line);
            }
            in.close();
            htmlSource = str.toString();
        } catch (ClientProtocolException e2) {
            // TODO Auto-generated catch block
            e2.printStackTrace();
        } catch (IOException e2) {
            // TODO Auto-generated catch block
            e2.printStackTrace();
        }


        try {
            doc = DocumentBuilderFactory.newInstance()
                      .newDocumentBuilder().parse(new InputSource(new StringReader(htmlSource)));
            XPathExpression xpath = XPathFactory.newInstance()
                      .newXPath().compile("//div/table/tbody/tr[child::td[contains(text(),\"USD\")]]/td[15]");
                    htmlResult = (String) xpath.evaluate(doc, XPathConstants.STRING);
        } catch (SAXException e1) {
            // TODO Auto-generated catch block
            Log.i("test", "plan failed 1"+e1);
            Log.i("test", "plan failed 1a "+ htmlSource);
            Log.i("test", "plan failed 1a "+ htmlResult);
            e1.printStackTrace();
        } catch (IOException e1) {
            // TODO Auto-generated catch block
            Log.i("test", "plan failed 2");

            e1.printStackTrace();
        } catch (ParserConfigurationException e1) {
            // TODO Auto-generated catch block
            Log.i("test", "plan failed 3");

            e1.printStackTrace();
        } catch (XPathExpressionException e) {
            // TODO Auto-generated catch block
            Log.i("test", "plan failed 4");

            e.printStackTrace();
        }

score 1 · Accepted Answer

您用作输入的源HTML 文件不是格式良好的 XML，这就是SAXParseException抛出的原因 - 让您知道缺少 XML 属性的值分隔符。

HTML 和 XML 非常不同。例如，HTML 可以有缺失或不匹配的结束标记，以及不带引号的属性值，而 XML 不允许这样。因此，强烈建议不要尝试将 HTML 解析为 XML。解析不能满足 HTML 允许的所有不一致。

有几种替代方法可以解决此问题：

从使用 Java 读取 HTML 文件到 DOM 树- 使用Neko尝试使 HTML 成为有效的 XML，这将使您能够保留现有的 SAXParser 代码，您必须找到日期
从上面的同一个问题 - 使用JTidy将 HTML 解析为 DOM 树并使用 DOM 方法查找您的数据。在 java 中看到xml dom 解析器？对于一些 Java DOM 解析器

java - 如何使用 xpath 从特定站点获取值并将其存储在数组中？（安卓）

1 回答 1

Related

Reference