java - 删除字符串中直到某个字符的所有内容，如果它也跟随，则可以选择删除字符串

Question

我正在寻找一个正则表达式，它可以删除第一个字符之前的任何字符&emsp，如果有(new section)以下字符，则也将&emsp其删除。但是以下正则表达式似乎不起作用。为什么？我该如何纠正？

String removeEmsp =" &ldquo;[<centd>[</centd>]&sect;&ensp;431:10A&ndash;126&emsp;(new section)[<centd>]Chemotherapy services.</centd>] <centa>Cancer treatment.</centa>test snl.";
Pattern removeEmspPattern1 = Pattern.compile("(.*(&emsp;(\\(new section\\)))?)(.*)", Pattern.MULTILINE);
System.out.println(removeEmspPattern1.matcher(removeEmsp).replaceAll("$2"));

score 0 · Accepted Answer

尝试这个：

String removeEmsp =" &ldquo;[<centd>[</centd>]&sect;&ensp;431:10A&ndash;126&emsp;(new section)[<centd>]Chemotherapy services.</centd>] <centa>Cancer treatment.</centa>test snl.";
System.out.println(removeEmsp.replaceFirst("^.*?\\&emsp;(\\(new\\ssection\\))?", ""));
System.out.println(removeEmsp.replaceAll("^.*?\\&emsp;(\\(new\\ssection\\))?", ""));

输出：

[<centd>]Chemotherapy services.</centd>] <centa>Cancer treatment.</centa>test snl.
[<centd>]Chemotherapy services.</centd>] <centa>Cancer treatment.</centa>test snl.

它将删除直到“ ”的所有内容以及可选的以下“（新部分）”文本（如果有）。

score 0 · Accepted Answer

您是否尝试过字符串拆分？这将基于分隔符从字符串创建字符串数组。

拆分字符串后，只需选择 print 语句所需的数组元素。

在这里阅读更多

score 0 · Accepted Answer

您的正则表达式很长，我不想调试它。但是提示是某些字符在正则表达式中具有特殊含义。例如&表示“和”。方括号允许定义字符组等。如果您希望将它们解释为字符而不是正则表达式命令，则必须对这些字符进行转义。要转义特殊字符，您必须\在它前面写。但\也是java的转义字符，所以它应该是重复的。

例如用字母替换＆符号，A你应该写str.replaceAll("\\&", "A")

现在您拥有所需的所有信息。尝试从更简单的正则表达式开始，然后将其扩展到您需要的内容。祝你好运。

EDIT BTW 使用正则表达式解析 XML 和/或 HTML 是可能的，但强烈不推荐。对此类格式使用特殊解析器。

java - 删除字符串中直到某个字符的所有内容，如果它也跟随，则可以选择删除字符串

3 回答 3

Related

Reference