java - 在标记内转换括号和嵌套括号的正则表达式

Question

我想写一个正则表达式，可以删除 [cent] 周围的括号

String input1 = "this is a [cent] and [cent] string" 
String output1 = "this is a cent and cent string"

但如果它嵌套如下：

String input2="this is a [cent[cent] and [cent]cent] string"
String output2="this is a cent[cent and cent]cent string"

我只能在字符串上使用 replaceAll，如何在下面的代码中创建模式？替换字符串应该是什么？

Pattern rulerPattern1 = Pattern.compile("", Pattern.MULTILINE);
System.out.println(rulerPattern1.matcher(input1).replaceAll(""));

更新：嵌套括号格式正确，只能有两层深，如案例 2。

编辑：如果这是字符串"[<centd>[</centd>]purposes[<centd>]</centd>]"；那么 OUTTUT 应该是<centd>[</centd> purposes <centd>]</centd>.. 基本上如果括号在 centd begin 和 end 之间，则将其留在那里，否则删除

score 6 · Accepted Answer

描述

此正则表达式将根据仅在括号的一侧有空间来替换括号。

正则表达式：(?<=\s)[\[\]](?=\S)|(?<=\S)[\[\]](?=\s)

替换为空字符串

在此处输入图像描述

概括

样品 1
- 输入：this is a [cent[cent] and [cent]cent] string
- 输出this is a cent[cent and cent]cent string
样品 2
- 输入：this is a [cent[cent] and [cent]cent] string
- 输出this is a cent[cent and cent]cent string
样品 3
- 输入：[<cent>[</cent>] and [<cent>]Chemotherapy services.</cent>]
- 输出[<cent>[</cent> and <cent>]Chemotherapy services.</cent>]

为了解决这个表达式将找到的问题的编辑：

[<centd>[</centd>]并将它们替换为<centd>[</centd>
[<centd>]或[</centd>], 并仅删除外部方括号
保留所有其他方括号

正则表达式：\[(<centd>[\[\]]<\/centd>)\]|\[(<\/?centd>)\]

用。。。来代替：$1$2

在此处输入图像描述

样品 4
- 输入：[<centd>[</centd>]purposes[<centd>]</centd>]
- 输出<centd>[</centd>pur [T] poses<centd>]</centd>

score 0 · Accepted Answer

如果它真的只是寻找围绕“cent”的括号，您可以使用以下方法（使用lookbehind，lookahead）：

编辑以根据预期输出保留一些括号：现在这是正面和负面的lookbehinds和lookaheads的组合。换句话说，正则表达式不太可能是解决方案，但确实适用于提供的文字，然后是一些。

// surrounding
String test1 = "this is a [cent] and [cent] string";
// pseudo-nested
String test2 = "this is a [cent[cent] and [cent]cent] string";
// nested
String test3 = "this is a [cent[cent]] and [cent]cent]] string";
Pattern pattern = Pattern.compile("((?<!cent)\\[+(?=cent))|((?<=cent)\\]+(?!cent))");
Matcher matcher = pattern.matcher(test1);
if (matcher.find()) {
    System.out.println(matcher.replaceAll(""));
}
matcher = pattern.matcher(test2);
if (matcher.find()) {
    System.out.println(matcher.replaceAll(""));
}
matcher = pattern.matcher(test3);
if (matcher.find()) {
    System.out.println(matcher.replaceAll(""));
}

输出：

this is a cent and cent string
this is a cent[cent and cent]cent string
this is a cent[cent and cent]cent string

score 0 · Accepted Answer

在一般情况下，正则表达式不适合此目的。嵌套结构是递归语法，而不是常规语法。（这就是为什么您不使用正则表达式解析 HTML的原因，顺便说一句。）

如果您只有有限的括号嵌套深度，您可以为此编写一个正则表达式。购买您需要先说明您的嵌套深度，并且正则表达式不会那么漂亮。

score 0 · Accepted Answer

假设

从问题来看，假设是嵌套括号不超过 2 级。还假设括号是平衡的。

我进一步假设您不允许转义[].

我还假设当有嵌套括号时，只保留内括号的第一个左括号[和最后一个右括号。]其余部分，即顶层支架和内部支架的其余部分被移除。

例如：

only[single] [level] outside[text more [text] some [text]moreeven[more]text[bracketed]] still outside

更换后会变成：

onlysingle level outsidetext more [text some textmoreevenmoretextbracketed] still outside

除了上述假设之外，没有其他假设。

如果您可以对括号前后的间距做出假设，那么您可以使用Denomales 提供的更简单的解决方案。否则，我下面的解决方案将在没有这种假设的情况下工作。

解决方案

private static String replaceBracket(String input) {
    // Search for singly and doubly bracketed text
    Pattern p = Pattern.compile("\\[((?:[^\\[\\]]++|\\[[^\\[\\]]*+\\])*+)\\]");
    Matcher matcher = p.matcher(input);

    StringBuffer output = new StringBuffer(input.length());

    while (matcher.find()) {
        // Take the text inside the outer most bracket
        String innerText = matcher.group(1);
        int startIndex = innerText.indexOf("[");
        int endIndex;

        String replacement;

        if (startIndex != -1) {
            // 2 levels of nesting
            endIndex = innerText.lastIndexOf("]");

            // Remove all [] except for first [ and last ]
            replacement = 
                // Text before and including first [
                innerText.substring(0, startIndex + 1) + 
                // Text inbetween, stripped of all the brackets []
                innerText.substring(startIndex + 1, endIndex).replaceAll("[\\[\\]]", "") +
                // Text after and including last ]
                innerText.substring(endIndex);
        } else {
            // No nesting
            replacement = innerText;
        }

        matcher.appendReplacement(output, replacement);
    }

    matcher.appendTail(output);

    return output.toString();
}

解释

这里唯一值得解释的是正则表达式。其余的你可以查看Matcher类的文档。

"\\[((?:[^\\[\\]]++|\\[[^\\[\\]]*+\\])*+)\\]"

以 RAW 形式（当您打印出字符串时）：

\[((?:[^\[\]]++|\[[^\[\]]*+\])*+)\]

让我们把它分解（空格无关紧要）：

\[                    # Outermost opening bracket
(                     # Capturing group 1
  (?:
    [^\[\]]++         # Text that doesn't contain []
    |                 # OR
    \[[^\[\]]*+\]     # A nested bracket containing text without []
  )*+
)                     # End of capturing group 1
\]                    # Outermost closing bracket

我使用了所有格量词*+，++以防止正则表达式引擎回溯。具有正常贪心量词的版本\[((?:[^\[\]]+|\[[^\[\]]*\])*)\]仍然可以工作，但效率会稍低，并且可能会导致StackOverflowError足够大的输入。

score -1 · Accepted Answer

您可以使用 java matcher 来转换括号。我在下面为你做了一个：

         String input = "this is a [cent[cent] and [cent]cent] string";
         Pattern p = Pattern.compile("\\[((?:[^\\[\\]]++|\\[[^\\[\\]]*+\\])*+)\\]");
         Matcher m = p.matcher(input);

java - 在标记内转换括号和嵌套括号的正则表达式

5 回答 5

描述

概括

假设

解决方案

解释

Related

Reference