html - StringEscapeUtils.unescapeHtml() 在为 Android 转义 HTML 实体中的问题

Question

这就是我正在做的事情：

public static String htmlToText(String inString)
{
String noentity=StringEscapeUtils.unescapeHtml(inString);
return noentity;
}

这是我调用它的地方：

String html = "<html><body>string 1<br />&#8212;<p>string 2</p></body></html>";
    String nohtml = Utility.htmlToText(html);
    Log.i("NON HTML STRING:",nohtml);

这是日志中的输出：

10-13 12:38:12.121: INFO/NON HTML STRING:(300): <html><body>string 1<br />â<p>string 2</p></body></html>

根据http://www.w3.org/TR/html4/sgml/entities.html —的参考资料，应该用“—”（这是我期望的输出）而不是“â”（不是我想要的是）。

起初我使用 JSoup 并且发生了同样的事情。认为这是一个错误，我切换到 org.apache.commons.lang 并且发生了同样的事情。

还有人知道这里发生了什么吗？我错过了一些明显的东西吗？

score 0 · Accepted Answer

解决.....

Logcat 中的输出有问题。

设置断点向我展示了正确的实际输出。

这是 Logcat 工具第二次让我偏离方向......

html - StringEscapeUtils.unescapeHtml() 在为 Android 转义 HTML 实体中的问题

1 回答 1

Related

Reference