我正在阅读一个维基百科 XML 文件,我必须在其中删除任何属于列表项的内容。例如对于以下字符串:
String text = ": definition list\n
** some list item\n
# another list item\n
[[Category:1918 births]]\n
[[Category:2005 deaths]]\n
[[Category:Scottish female singers]]\n
[[Category:Billy Cotton Band Show]]\n
[[Category:Deaths from Alzheimer's disease]]\n
[[Category:People from Glasgow]]";
在这里,我想删除*
,#
和:
,但不是它所说的类别。输出应如下所示:
String outtext = "definition list\n
some list item\n
another list item\n
[[Category:1918 births]]\n
[[Category:2005 deaths]]\n
[[Category:Scottish female singers]]\n
[[Category:Billy Cotton Band Show]]\n
[[Category:Deaths from Alzheimer's disease]]\n
[[Category:People from Glasgow]]";
我正在使用以下代码:
Pattern pattern = Pattern.compile("(^\\*+|#+|;|:)(.+)$");
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
String outtext = matcher.group(0);
outtext = outtext.replaceAll("(^\\*+|#+|;|:)\\s", "");
return(outtext);
}
这是行不通的。你能指出我应该怎么做吗?