1

我需要从日志文件中识别服务器事件。我为此目的使用模式匹配。我的正则表达式不工作。请检查我的正则表达式是否错误或问题是否在其他地方。

样本输入是:-

2009/12/14 11:49:20.55                  00 STARTUP  Distributed Access Infrastructure V1.1.0   
2009/12/14 11:49:20.55                  01 STARTUP    Tools Access Server initialization started   
2009/12/14 11:49:20.55 TAS#####EC05003E 00 STARTUP  Environment:    
2009/12/14 11:49:20.55 TAS#####EC05003E 01 STARTUP    Job.....DAITAS     System...EC05      ASID.....003E    
2009/12/14 11:49:20.55 TAS#####EC05003E 02 STARTUP    User....USRT001    Group....SYS1      JobNum...STC00079
2009/12/14 11:49:20.55 TAS#####EC05003E 03 STARTUP    Local...GMT-08     GMT......2009/12/14 19:49

我的脚本是:

public void map(Object key, Text value, Context context) throws IOException , InterruptedException{

        String input=value.toString();
        String delimiter= "[\n]";
        String[] tokens=input.split(delimiter);
        String sample = null;

        Pattern pattern;
        String regex= " \\s+\\d+\\s+[a-z,A-Z]+\\s ";
        pattern=Pattern.compile(regex);




        for(int i=0;i<tokens.length;i++){
            sample=tokens[i];
            System.out.println(sample.toString());
            System.out.println("enter here");

            Matcher match=pattern.matcher(sample);
            boolean val = match.matches();

            System.out.println("the conditions" + val);
            System.out.println("enter here 2");
            if(val){
                System.out.println("the regex is found" + val);
                logEvent.set(sample);
                System.out.println("the value of logEvent is "+ logEvent);
            }
            else{
                logInformation.set(sample);
                System.out.println("the log informaTION" + logInformation);
            }
        context.write(logEvent, logInformation);    

我需要认识——启动

谢谢

4

1 回答 1

0

试试这个

try {
    Regex regexObj = new Regex(@"(?im)\s+(?<event>\d+\s+[a-z]+)\s+(?<details>[^\r\n]+)$");
    Match matchResults = regexObj.Match(subjectString);
    while (matchResults.Success) {
        for (int i = 1; i < matchResults.Groups.Count; i++) {
            Group groupObj = matchResults.Groups[i];
            if (groupObj.Success) {
                // matched text: groupObj.Value
                // match start: groupObj.Index
                // match length: groupObj.Length
            } 
        }
        matchResults = matchResults.NextMatch();
    } 
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

正则表达式解释

@"
(?im)          # Match the remainder of the regex with the options: case insensitive (i); ^ and $ match at line breaks (m)
\s             # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?<event>      # Match the regular expression below and capture its match into backreference with name “event”
   \d             # Match a single digit 0..9
      +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   \s             # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
      +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   [a-z]          # Match a single character in the range between “a” and “z”
      +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\s             # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?<details>    # Match the regular expression below and capture its match into backreference with name “details”
   [^\r\n]        # Match a single character NOT present in the list below
                     # A carriage return character
                     # A line feed character
      +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
$              # Assert position at the end of a line (at the end of the string or before a line break character)
"

希望这可以帮助。

于 2012-05-28T05:31:30.253 回答