java - 在 Java 中使用正则表达式匹配子域和顶级域

Question

跟进这个问题 Regex to match pattern with subdomain in java

我使用下面的模式来匹配域和子域

  Pattern pattern = Pattern.compile("http://([a-z0-9]*.)example.com");

此模式与以下匹配

http://asd.example.com
http://example.example.com
http://www.example.com

但它不匹配

http://example.com

谁能告诉我怎么搭配http://example.com？

score 1 · Accepted Answer

只需使用 a 将第一部分设为可选?：

Pattern pattern = Pattern.compile("http://([a-z0-9]*\\.)?example\\.com");

请注意，.匹配任何字符时，您应该使用\\.来匹配文字点。

score 0 · Accepted Answer

You can use this regex pattern to get domains of all urls:

\\p{L}{0,10}(?:://)?[\\p{L}\\.]{1,50}

For example;

Input  = http://www.google.com/search?q=a
Output = http://www.google.com

Input  = ftp://www.google.com/search?q=a
Output = ftp://www.google.com

Input  = www.google.com/search?q=a
Output = www.google.com

Here, \p{L}{0,10} stands for the http, https and ftp parts (there could be some more i don't know), (?:://)? stands for :// part if appears, [\p{L}\.]{1,50} stands for the foo.bar.foo.com part. The rest of the url is cut out.

And here is the java code that accomplises the job:

public static final String DOMAIN_PATTERN = "\\p{L}{0,10}(?:://)?[\\p{L}\\.]{1,50}";

public static String getDomain(String url) {
    if (url == null || url.equals("")) {
        return "";
    }
    Pattern p = Pattern.compile(DOMAIN_PATTERN);
    Matcher m = p.matcher(url);

    if (m.find()) {
        return m.group();
    }
    return "";
}

public static void main(String[] args) {
    System.out.println(getDomain("www.google.com/search?q=a"));
}

Output = www.google.com

Finally, if you want to match just "example.com" you can simply add it to the end of the pattern like :

\\p{L}{0,10}(?:://)?[\\p{L}\\.]{0,50}example\\.com

And this will get all of the domains with "example.com":

Input  = http://www.foo.bar.example.com/search?q=a
Output = http://www.foo.bar.example.com

Note : Note that \p{Ll} can be used instead of \p{L} because \p{Ll} catches lowercase unicode letters (\p{L} all kind of unicode letters) and urls are constructed of lowercase letters.

java - 在 Java 中使用正则表达式匹配子域和顶级域

2 回答 2

Related

Reference