java - 您将如何使用正则表达式来忽略包含特定子字符串的字符串？

Question

我将如何使用否定的lookbehind（或任何其他方法）正则表达式来忽略包含特定子字符串的字符串？

我已经阅读了之前的两个 stackoverflow 问题：
java-regexp-for-file-filtering
regex-to-match-against-something-that-is-not-a-specific-substring

它们几乎是我想要的......我的问题是字符串没有以我想要忽略的结尾。如果这样做，这将不是问题。

我有一种感觉，这与环视是零宽度的事实有关，并且在第二次通过字符串时匹配...但是，我不太确定内部结构。

无论如何，如果有人愿意花时间解释一下，我将不胜感激。

这是我想忽略的输入字符串的示例：

192.168.1.10 - - [08/Feb/2009:16:33:54 -0800] “GET /FOO/BAR/HTTP/1.1”200 2246

这是我想保留以供进一步评估的输入字符串的示例：

192.168.1.10 - - [08/Feb/2009:16:33:54 -0800] “GET /FOO/BAR/content.js HTTP/1.1”200 2246

对我来说，关键是我想忽略任何在文档根默认页面之后的 HTTP GET。

以下是我的小测试工具和迄今为止我提出的最好的 RegEx。

public static void main(String[] args){
String inString = "192.168.1.10 - - [08/Feb/2009:16:33:54 -0800] \"GET /FOO/BAR/ HTTP/1.1\" 200 2246";
//String inString = "192.168.1.10 - - [08/Feb/2009:16:33:54 -0800] \"GET /FOO/BAR/content.js HTTP/1.1\" 200 2246";
//String inString = "192.168.1.10 - - [08/Feb/2009:16:33:54 -0800] \"GET /FOO/BAR/content.js HTTP/"; // This works
//String inString = "192.168.1.10 - - [08/Feb/2009:16:33:54 -0800] \"GET /FOO/BAR/ HTTP/"; // This works
String inRegEx = "^.*(?:GET).*$(?<!.?/ HTTP/)";
try {
  Pattern pattern = Pattern.compile(inRegEx);

  Matcher matcher = pattern.matcher(inString);

  if (matcher.find()) {
    System.out.printf("I found the text \"%s\" starting at " +
"index %d and ending at index %d.%n",
matcher.group(), matcher.start(), matcher.end());
  } else {
    System.out.printf("No match found.%n");
  }
} catch (PatternSyntaxException pse) {
  System.out.println("Invalid RegEx: " + inRegEx);
  pse.printStackTrace();
}
}

score 4 · Accepted Answer

你能匹配任何不以 a 结尾的路径吗/

String inRegEx = "^.* \"GET (.*[^/]) HTTP/.*$";

这也可以使用负面的lookbehind来完成

String inRegEx = "^.* \"GET (.+)(?<!/) HTTP/.*$";

这里，(?<!/)说“前面的序列不能匹配/”。

score 1 · Accepted Answer

也许我在这里遗漏了一些东西，但是你不能不使用任何正则表达式而忽略任何正确的东西：

string.contains("/ HTTP")

因为文件路径永远不会以斜杠结尾。

score 0 · Accepted Answer

0

我会使用这样的东西：

"\"GET /FOO/BAR/[^ ]+ HTTP/1\.[01]\""

这匹配不只是/FOO/BAR/.

于 2009-02-09T23:06:23.017 回答

score -1 · Accepted Answer

如果您正在编写如此复杂的正则表达式，我建议您在 StackOverflow 之外构建一个资源库。

java - 您将如何使用正则表达式来忽略包含特定子字符串的字符串？

4 回答 4

Related

Reference