我发现这个正则表达式可以匹配 url(最初是 Daring Fireball 在 Javascript 中),它在 java 中有效,但在某些情况下非常慢:
private final static String pattern =
"\\b" +
"(" + // Capture 1: entire matched URL
"(?:" +
"[a-z][\\w-]+:" + // URL protocol and colon
"(?:" +
"/{1,3}" + // 1-3 slashes
"|" + // or
"[a-z0-9%]" + // Single letter or digit or '%'
// (Trying not to match e.g. "URI::Escape")
")" +
"|" + // or
"www\\d{0,3}[.]" + // "www.", "www1.", "www2." … "www999."
"|" + // or
"[a-z0-9.\\-]+[.][a-z]{2,4}/" + // looks like domain name followed by a slash
")" +
"(?:" + // One or more:
"[^\\s()<>]+" + // Run of non-space, non-()<>
"|" + // or
"\\((?:[^\\s()<>]+|(?:\\([^\\s()<>]+\\)))*\\)" + // balanced parens, up to 2 levels
")+" +
"(?:" + // End with:
"\\((?:[^\\s()<>]+|(?:\\([^\\s()<>]+\\)))*\\)" + // balanced parens, up to 2 levels
"|" + // or
"[^\\s`!\\-()\\[\\]{};:'\".,<>?«»“”‘’]" + // not a space or one of these punct chars (updated to add a 'dash'
")" +
")";
我发现主题:Java 正则表达式运行速度非常慢,问题出在这段代码中:
"(?:" + // One or more:
"[^\\s()<>]+" + // Run of non-space, non-()<>
"|" + // or
"\\((?:[^\\s()<>]+|(?:\\([^\\s()<>]+\\)))*\\)" + // balanced parens, up to 2 levels
")+"
似乎要解决这个问题,我需要使这些内部量词具有所有格(实际上是嵌套的),但我不知道该怎么做谢谢你的建议,对不起我的英语不好!