java - Using JSoup to parse text between two different tags

Question

I have the following HTML...

<h3 class="number">
<span class="navigation">
6:55 <a href="/results/result.html" class="under"><b>&raquo;</b></a>
</span>**This is the text I need to parse!**</h3>

I can use the following code to extract the text from h3 tag.

Element h3 = doc.select("h3").get(0);

Unfortunately, that gives me everything in that tag.

6:55 &raquo; This is the text I need to parse!

Can I use Jsoup to parse between different tags? Is there a best practice for doing this (regex?)

score 3 · Accepted Answer

（正则表达式？）

不，正如您在这个问题的答案中看到的那样，您不能使用正则表达式解析 HTML。

尝试这个：

Element h3 = doc.select("h3").get(0);
String h3Text = h3.text();
String spanText = h3.select("span").get(0).text();
String textBetweenSpanEndAndH3End = h3Text.replace(spanText, "");

score 0 · Accepted Answer

不，JSoup 不是为此而生的。它应该解析分层的东西。搜索结束标记和开始标记之间的文本，或者相反，对 JSoup 没有任何意义。这就是正则表达式的用途。

但是，在使用正则表达式对字符串进行射击之前，您当然应该首先使用 JSoup 尽可能地缩小范围。

java - Using JSoup to parse text between two different tags

2 回答 2

Related

Reference