java - 使用 jericho html 解析器解析页面中的指定文本

Question

从页面检索指定文本时出现问题。我正在使用的示例是专利受让人摘要

如果您访问该站点，您会看到有一个“Total: 82”（这是 criterium SASA 的命中数）。我需要得到这个号码。我使用 jericho html 解析器，但我找不到任何功能。

有人可以帮我解决这个问题吗？我真的需要在页面上获得这个数字。

在此先感谢-莎莎

score 0 · Accepted Answer

如果您可以切换到Jsoup：

/* Connect to URL and parse it into a 'Document' */
Document doc = Jsoup.connect("http://assignments.uspto.gov/assignments/q?db=pat&qt=asne&reel=&frame=&pat=&pub=&asnr=&asnri=&asne=sasa&asnei=&asns=").get();

/* Select the required tag and print the value */
System.out.println(doc.select("p.t2").first().text());

完毕！

输出：

总计：83 （网站上的值已更改）

选择器解释说：

doc.select("p.t2") // Select each 'p'-tag with 't2' attribute from document
   .first() // Get the first one (there are two on the website, but the first one is the required one)
   .text() // Get the text of this element

文档：

java - 使用 jericho html 解析器解析页面中的指定文本

1 回答 1

Related

Reference