java - 带有 TagSoup 和不间断空格值的 Groovy XmlSlurper

Question

我正在使用XmlSlurper由 tagoup 支持的Groovy 解析一些 HTML4 Parser。

我text()成功地获得了一个节点，但是 在尝试测试与另一个值是否相等时，HTML 空间给我带来了一些困难。具体来说，.trim()实际上并不修剪所有空格的字符串。在我看来，值两侧的字符都是空格（见下面的代码），但String.trim()并没有像我期望的那样修剪。从代码示例可以看出，Character.isSpaceChar()对于字符串中的第一个字符被确定为空格字符。

为什么String.trim()不修剪我从中获得的这个值XmlSlurper？

@Grab('org.ccil.cowan.tagsoup:tagsoup:1.2.1')
import org.ccil.cowan.tagsoup.Parser

def html = '''
<html>
<body>
<span id="interested">&nbsp;hello&nbsp;</span>
</body>
</html>
'''

def slurper = new XmlSlurper(new Parser() )
def document = slurper.parseText(html)

def value = document.'**'.find { it['@id'] == 'interested' }.text()

println "value=[${value}]"
println "first char isWhitespace? ${Character.isWhitespace(value.charAt(0))}"
println "first char isSpaceChar? ${Character.isSpaceChar(value.charAt(0))}"
assert 'hello' == value.trim()

产量：

value=[ hello ]
first char isWhitespace? false
first char isSpaceChar? true
Exception thrown

Assertion failed: 

assert 'hello' == value.trim()
               |  |     |
               |  |      hello 
               |   hello 
               false

我在用着Groovy Version: 2.3.6 JVM: 1.8.0 Vendor: Oracle Corporation OS: Mac OS X

score 3 · Accepted Answer

在这里，您已更正示例：

@Grab('org.ccil.cowan.tagsoup:tagsoup:1.2.1')
import org.ccil.cowan.tagsoup.Parser

def html = '''
<html>
<body>
<span id="interested">&nbsp;hello&nbsp;</span>
</body>
</html>
'''

def slurper = new XmlSlurper(new Parser() )
def document = slurper.parseText(html)

def value = document.'**'.find { it['@id'] == 'interested' }.text()

println "value=[${value}]"
println "first char isWhitespace? ${Character.isWhitespace(value.charAt(0))}"
println "first char isSpaceChar? ${Character.isSpaceChar(value.charAt(0))}"
value = value.trim()
println "first char isWhitespace? ${Character.isWhitespace(value.charAt(0))}"
println "first char isSpaceChar? ${Character.isSpaceChar(value.charAt(0))}"
assert 'hello' == value.replaceAll(String.valueOf((char) 160), " ").trim()

可以在此处找到解释（空格与不间断空格）。

java - 带有 TagSoup 和不间断空格值的 Groovy XmlSlurper

1 回答 1

Related

Reference