0

我想将ANSI Escape序列转换为IRC 颜色序列。

所以我写了一个正则表达式 1\e\[([\d;]+)?m但是,shell_output_string.replaceFirst ("\\e\\[([\\d;]+)?m", "$1")它将返回匹配的子字符串和其余不匹配的子字符串。

然后我写了正则表达式 2 .*\e\[([\d;]+)?m.*,希望它可以匹配整个字符串并用匹配的子字符串替换它,但是,replaceFirst (".*\\e\\[([\\d;]+)?m.*", "$1")返回空字符串,但是matches (".*\\e\\[([\\d;]+)?m.*")is true。这个正则表达式有什么问题?

以下问题与此问题非常相似:Pattern/Matcher group() to obtain substring in Java?

示例代码

import java.util.regex.*;
public class AnsiEscapeToIrcEscape
{
    public static void main (String[] args)
    {
//# grep --color=always bot /etc/passwd
//
//bot:x:1000:1000:bot:/home/bot:/bin/bash
byte[] shell_output_array = {
0x1B, 0x5B, 0x30, 0x31, 0x3B, 0x33, 0x31, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[01;31m^[[K  (#1 - #11)
0x62, 0x6F, 0x74,   // bot  (#12 - #14)
0x1B, 0x5B, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[m^[[K (#15 - #20)
0x3A, 0x78, 0x3A, 0x31, 0x30, 0x30, 0x30, 0x3A, 0x31, 0x30, 0x30, 0x30, 0x3A,   // :x:1000:1000:    (#21 - #33)
0x1B, 0x5B, 0x30, 0x31, 0x3B, 0x33, 0x31, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[01;31m^[[K  (#34 - #44)
0x62, 0x6F, 0x74,   // bot  (#45 - #47)
0x1B, 0x5B, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[m^[[K (#48 - #53)
0x3A, 0x2F, 0x68, 0x6F, 0x6D, 0x65, 0x2F,   // :/home/  (#54 - #60)
0x1B, 0x5B, 0x30, 0x31, 0x3B, 0x33, 0x31, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[01;31m^[[K  (#61 - #71)
0x62, 0x6F, 0x74,   // bot  (#72 - #74)
0x1B, 0x5B, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[m^[[K (#75 - #80)
0x3A, 0x2F, 0x62, 0x69, 0x6E, 0x2F, 0x62, 0x61, 0x73, 0x68, // :/bin/bash   (#81 - #90)
};
        String shell_output = new String (shell_output_array);
        System.out.println (shell_output);
        System.out.println ("total " + shell_output_array.length + " bytes");

        final String CSI_REGEXP = "\\e\\[";
        final String CSI_SGR_REGEXP_First = CSI_REGEXP + "([\\d;]+)?m";
        final String CSI_SGR_REGEXP = ".*" + CSI_SGR_REGEXP_First + ".*";

        System.out.println (shell_output.replaceFirst(CSI_SGR_REGEXP_First, "$1"));
        System.out.println (shell_output.replaceFirst(CSI_SGR_REGEXP, "$1"));
    }
}
4

2 回答 2

1

正则表达式是贪婪的——也就是说,每个模式都会尝试尽可能多地匹配输入。

这意味着当模式以 .* 开头时,模式的该部分将尝试覆盖尽可能多的输入文本 - 如此有效地强制模式的其余部分尝试从末尾开始查找匹配项向前面工作的输入字符串。

那么,从字符串末尾开始,其余模式的第一个匹配是什么(或者,如果您愿意,最后一个匹配的子字符串是什么)?它位于您输入的倒数第二行,仅包含 ^[m

这匹配是因为模式的整个 ([\d;]+) 部分由以下内容变为可选?.

反过来,这意味着,由于最终表达式没有数字或 ;,$1 组是空的 - 因此您得到空字符串输出。

至少,这是我认为没有靠近 Java 机器来测试它的情况。希望能帮助到你。

于 2013-10-21T09:34:38.107 回答
0
    The API of String's replaceFirst says :


     replaceFirst

    public String replaceFirst(String regex,
                               String replacement)

        Replaces the first substring of this string that matches the given regular expression with the given replacement.

        An invocation of this method of the form str.replaceFirst(regex, repl) yields exactly the same result as the expression

            Pattern.compile(regex).matcher(str).replaceFirst(repl)

        Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string; see Matcher.replaceFirst(java.lang.String). Use Matcher.quoteReplacement(java.lang.String) to suppress the special meaning of these characters, if desired.

        Parameters:
            regex - the regular expression to which this string is to be matched
            replacement - the string to be substituted for the first match 
        Returns:
            The resulting String 
        Throws:
            PatternSyntaxException - if the regular expression's syntax is invalid
        Since:
            1.4
        See Also:
            Pattern



Please read the Note Part which specifies that the \ and $ may cause the result to be different.
You can use Pattern and Matcher instead.

Example  
public class RegexMatches
{
    public static void main( String args[] ){

      // String to be scanned to find the pattern.
     // String line = "This order was placed for QT3000! OK?";
     // String pattern = "(.*)(\\d+)(.*)";

      byte[] shell_output_array = {
              0x1B, 0x5B, 0x30, 0x31, 0x3B, 0x33, 0x31, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[01;31m^[[K  (#1 - #11)
              0x62, 0x6F, 0x74,   // bot  (#12 - #14)
              0x1B, 0x5B, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[m^[[K (#15 - #20)
              0x3A, 0x78, 0x3A, 0x31, 0x30, 0x30, 0x30, 0x3A, 0x31, 0x30, 0x30, 0x30, 0x3A,   // :x:1000:1000:    (#21 - #33)
              0x1B, 0x5B, 0x30, 0x31, 0x3B, 0x33, 0x31, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[01;31m^[[K  (#34 - #44)
              0x62, 0x6F, 0x74,   // bot  (#45 - #47)
              0x1B, 0x5B, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[m^[[K (#48 - #53)
              0x3A, 0x2F, 0x68, 0x6F, 0x6D, 0x65, 0x2F,   // :/home/  (#54 - #60)
              0x1B, 0x5B, 0x30, 0x31, 0x3B, 0x33, 0x31, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[01;31m^[[K  (#61 - #71)
              0x62, 0x6F, 0x74,   // bot  (#72 - #74)
              0x1B, 0x5B, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[m^[[K (#75 - #80)
              0x3A, 0x2F, 0x62, 0x69, 0x6E, 0x2F, 0x62, 0x61, 0x73, 0x68, // :/bin/bash   (#81 - #90)
              };
      String line = new String (shell_output_array);
      //String pattern = "(.*)(\\d+)(.*)";
      final String CSI_REGEXP = "\\e\\[";
      final String CSI_SGR_REGEXP_First = CSI_REGEXP + "([\\d;]+)?m";
      final String CSI_SGR_REGEXP = ".*" + CSI_SGR_REGEXP_First + ".*";

      // Create a Pattern object
      Pattern r = Pattern.compile(CSI_SGR_REGEXP);

      // Now create matcher object.
      Matcher m = r.matcher(line);
      while (m.find()) {
         System.out.println(m.start() + "  " + m.end());
         System.out.println("Found value: " + m.group());
      } 
   }
}
于 2013-10-21T09:17:45.107 回答