2

I am very weak in regex and the regex I am using (found from internet) is only partially solving my problem. I need to add an anchor tag to a URL from text input using java. Here is my code:

String text ="Hi please visit www.google.com";
String reg = "\\b(([\\w-]+://?|www[.])[^\\s()<>]+(?:\\([\\w\\d]+\\)|([^[:punct:]\\s]|/)))";
        String s = text.replaceAll(reg, "<a href='$1'>$1</a>");
        System.out.println(""+s);

The output currently is Hi please visit <a href='www.google.c'>www.google.c</a>om. What's wrong with the regex?

I need to parse a text and display a URL entered from text field as hot link in a jsp page. The actual output expected would be

Hi please visit <a href='www.google.com'>www.google.com</a>

Edit

Following regex

(http(s)?://)?(www(\.\w+)+[^\s.,"']*)

works like a charm in url ending with .com but fails in other extensions like .jsp.Is there any way for it to work in all sort of extension?

4

2 回答 2

4

回答您的问题为什么正则表达式不起作用:它不遵守 Java 的正则表达式语法规则。

具体来说:

[^[:punct:]\s]

不能像您期望的那样工作,因为 Java 无法识别[:punct:]. 相反,它将其视为嵌套字符类。这再次导致^ 在该上下文中变得非法,因此 Java 忽略它,为您留下一个匹配相同的字符类

[:punct\s]

仅匹配cof com,因此在那里结束您的匹配。

关于如何在文本块中查找 URL 的问题,我建议您阅读 Jan Goyvaert 的优秀博客文章Detecting URLs in a block of text。你需要自己决定你想让你的正则表达式有多敏感和多具体。

例如,帖子末尾提出的解决方案将转换为 Java 为

String resultString = subjectString.replaceAll(
    "(?imx)\\b(?:(?:https?|ftp|file)://|www\\.|ftp\\.)\n" +
    "(?:\\([-A-Z0-9+&@\\#/%=~_|$?!:,.]*\\)|\n" +
    "      [-A-Z0-9+&@\\#/%=~_|$?!:,.])*\n" +
    "(?:\\([-A-Z0-9+&@\\#/%=~_|$?!:,.]*\\)|\n" +
    "      [A-Z0-9+&@\\#/%=~_|$])", "<a href=\"$0\">$0</a>");
于 2013-07-01T06:38:36.153 回答
2

Java 识别 posix 表达式(请参阅 javadoc),但语法略有不同。它看起来像这样:

\p{Punct}

但我会将 URL 的正则表达式简化为:

(?i)(http(s)?://)?((www(\.\w+)+|(\d{1,3}\.){3}\.\d{1,3})[^\s,"']*(?<!\\.))

并且只有当你找到一个破坏它的测试用例时才详细说明它。

作为 java 行,它将是:

text = text.replaceAll("(?i)(http(s)?://)?((www(\\.\w+)+|(\\d{1,3}\\.){3}\\d{1,3})[^\\s,\"']*(?<!\\.))", "<a href=\"http$2://$3\">$3</a>");

请注意“https”(如果找到)中“s”的整洁捕获,如果需要,它会恢复。

于 2013-07-01T07:18:13.233 回答