我最终做了一个更机械的逻辑。不知道是否有更好的解决方案,但我所做的是根据其代码转换每个字符(我不知道 RSS 用于特殊字符的代码类型)。这是我的逻辑
html = i.getDescription(); // some tag of rss feed
html = html.replaceAll("<(.*?)\\>"," ");//Removes all items in brackets
html = html.replaceAll("<(.*?)\\\n"," ");//Must be undeneath
html = html.replaceFirst("(.*?)\\>", " ");//Removes any connected item to the last bracket
html = html.replaceAll(" "," ");
html = html.replaceAll("&"," ");
html = html.replaceAll(""","'");
html = html.replaceAll("ç","ç");
html = html.replaceAll("ã","ã");
html = html.replaceAll("ó","ó");
html = html.replaceAll("á","á");
html = html.replaceAll("é","é");
html = html.replaceAll("í","í");
html = html.replaceAll("ê","ê");
html = html.replaceAll("É","É");
有了这个逻辑,我也删除了 HTML 标签