我错过了什么吗?有一个更好的方法吗?
输入:
<span style="FONT-FAMILY: 'Lucida Sans','sans-serif'; COLOR: #003572; FONT-SIZE: 9pt;
mso-fareast-font-family: Calibri; mso-ansi-language: EN-US; mso-fareast-language: EN-US;
mso-bidi-language: AR-SA; mso-fareast-theme-font: minor-latin">Dr. Who is
<u>usually</u> available for consultations Mon - Thurs afternoons and Friday 9a-
12p at 555-1212. </span>
期望的输出:
<span style="COLOR: #003572; FONT-SIZE: 9pt;">博士。谁<u>通常</u>可以在周一至周四下午和周五 9a-12p 在 555-1212 进行咨询。</span>
到目前为止我的代码:
//在写入数据库之前清除周长注释内的 HTML
Whitelist wl = new Whitelist(); wl = Whitelist.simpleText(); wl.addTags("br"); wl.addTags("p"); wl.addTags("span"); wl.addAttributes(":all","style"); Document doc = Jsoup.parse( "<html><head></head><body>"+ds.getWeeklongNote()+"</body></html>"); Elements e = doc.select("*"); for (Element el : e){ for (Attribute attr : el.attributes()){ if (attr.getKey().equals("span")){ String newValue = ""; String s = attr.getValue(); String[] values = s.split(";"); for (String value : values){ if (value.startsWith("COLOR")||value.startsWith("FONT-SIZE")){ newValue += attr.getKey()+"="+attr.getValue()+";"; } } attr.setValue(newValue); } } } doc.html(e.outerHtml()); ds.setWeekLongNote(Jsoup.clean(doc.body().outerHtml(), wl));