java - Java regEx URL 匹配问题

Question

和往常一样提前谢谢你。

我正在尝试熟悉 regEx，但遇到了与 URL 匹配的问题。

这是一个示例网址：

www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html

这是我的正则表达式分解的样子：

[site]/[dir]*?/[year]/[month]/[day]/[storyTitle]?/[id]/htmlpage.html

the[id]是一个长度为 22 个字符的字符串，可以是大写或小写字母，也可以是数字。但是，我不想从 URL 中提取它。只是澄清

现在，我需要从这个 url 中提取两个值。

首先，我需要提取目录。但是，[dir]是可选的，也可以是任意多的。换句话说，该参数不能存在，或者它可能是dir1/dir2/dir3 ..etc 。所以，离开我的第一个例子：

    www.examplesite.com/dir1/dir2/dir3/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html

在这里，我需要提取dir1/dir2/dir3dir 是一个字符串，该字符串是一个包含所有小写字母的单词（即sports/mlb/games）。目录中没有数字，仅以此为例。

但在这个有效 URL 的示例中：

www.examplesite.com/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html

没有[dir]，所以我不会提取任何东西。因此，这[dir]是可选的

其次，我需要像上面一样提取[storyTitle]where[storyTitle]也是可选的[dir]，但是如果有 astoryTitle则只能有一个。

所以离开我之前的例子

www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html

在我需要提取'title-of-some-story'故事标题是破折号分隔的字符串的地方是有效的，这些字符串总是小写的。下面的例子也是有效的：

www.examplesite.com/dir/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html

在上面的例子中，没有[storyTitle]因此使它成为可选的

最后，为了彻底，没有 a[dir]和没有 a的 URL[storyTitle]也是有效的。例子：

www.examplesite.com/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html

是一个有效的 URL。任何输入都会有所帮助，我希望我很清楚。

score 1 · Accepted Answer

这是一个可行的示例。

public static void main(String[] args) {

    Pattern p = Pattern.compile("(?:http://)?.+?(/.+?)?/\\d+/\\d{2}/\\d{2}(/.+?)?/\\w{22}");

    String[] strings ={
            "www.examplesite.com/dir1/dir2/4444/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html",
            "www.examplesite.com/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html",
            "www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html",
            "www.examplesite.com/dir/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html",
            "www.examplesite.com/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html"
    };
    for (int idx = 0; idx < strings.length; idx++) {
        Matcher m = p.matcher(strings[idx]);
        if (m.find()) {
            String dir = m.group(1);
            String title = m.group(2);
            if (title != null) {
                title = title.substring(1); // remove the leading /
            }
            System.out.println(idx+": Dir: "+dir+", Title: "+title);
        }
    }
}

score 0 · Accepted Answer

这是一个全正则表达式解决方案。

编辑：允许 http://

Java源码：

import java.util.*;
import java.lang.*;
import java.util.regex.*;

class Main
{
    public static void main (String[] args) throws java.lang.Exception
    {
        String url = "http://www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html";
        String url2 = "www.examplesite.com/dir/dir2/dir3/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html";
        String url3 = "www.examplesite.com/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html";

        String patternStr = "(?:http://)?[^/]*[/]?([\\S]*)/[\\d]{4}/[\\d]{2}/[\\d]{2}[/]?([\\S]*)/[\\S]*/[\\S]*";

        // Compile regular expression
        Pattern pattern = Pattern.compile(patternStr);


        // Match 1st url
        System.out.println("Match 1st URL:");
        Matcher matcher = pattern.matcher(url);

        if (matcher.find()) {
            System.out.println("URL: " + matcher.group(0));
            System.out.println("DIR: " + matcher.group(1));
            System.out.println("TITLE: " + matcher.group(2));
        }
        else{ System.out.println("No match."); }


        // Match 2nd url
        System.out.println("\nMatch 2nd URL:");
        matcher = pattern.matcher(url2);

        if (matcher.find()) {
            System.out.println("URL: " + matcher.group(0));
            System.out.println("DIR: " + matcher.group(1));
            System.out.println("TITLE: " + matcher.group(2));
        }
        else{ System.out.println("No match."); }


        // Match 3rd url
        System.out.println("\nMatch 3rd URL:");
        matcher = pattern.matcher(url3);

        if (matcher.find()) {
            System.out.println("URL: " + matcher.group(0));
            System.out.println("DIR: " + matcher.group(1));
            System.out.println("TITLE: " + matcher.group(2));
        }
        else{ System.out.println("No match."); }
    }
}

输出：

Match 1st URL:
URL: http://www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
DIR: dir
TITLE: title-of-some-story

Match 2nd URL:
URL: www.examplesite.com/dir/dir2/dir3/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
DIR: dir/dir2/dir3
TITLE: 

Match 3rd URL:
URL: www.examplesite.com/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
DIR: 
TITLE: title-of-some-story

java - Java regEx URL 匹配问题

2 回答 2

Related

Reference