3

我试图弄清楚如何编写一个regex与时间匹配的。时间可以是这样的:11:15-12:1511-12:1511-12等等。我目前拥有的是这样的:

\\d{2}:?\\d{0,2}-{1}\\d{2}:?\\d{0,2}

在日期到来之前确实有效。如果出现这样的字符串,这regex将捕获2013-11-05。我不希望它找到日期。我知道我应该使用Lookbehind,但我无法让它工作。

Jsoup Element getElementsMatchingOwnText如果该信息有任何兴趣,我正在使用方法。

时间字符串包含在 html 源代码中。像这样:(但上面和下面有更多文字)

<td class="text">2013-11-04</td>
4

2 回答 2

3

Try this. Start with the base regex:

\d{1,2}(:\d\d)?-\d{1,2}(:\d\d)?

That is:

  • one-to-two digits, optionally followed by : and two more digits
  • followed by a hyphen
  • followed by one-to-two digits, optionally followed by : and two more digits

This matches all your core cases:

11-12
1-2
1:15-2
10-3:45
2:15-11:30

etc. Now mix in negative lookbehind and negative lookahead to invalidate matches that appear within undesired contexts. Let's invalidate the match when a digit or dash or colon appears directly to the left or right of the match:

The negative lookbehind: (?<!\d|-|:) The negative lookahead: (?!\d|-|:)

Slap the neg-lookbehind at the beginning, and the neg-lookahead at the end, you get:

(?<!\d|-|:)(\d{1,2}(:\d\d)?-\d{1,2}(:\d\d)?)(?!\d|-|:)

or as a Java String (by request)

Pattern p = Pattern.compile("(?<!\\d|-|:)(\\d{1,2}(:\\d\\d)?-\\d{1,2}(:\\d\\d)?)(?!\\d|-|:)");

Now while the lookaround has eliminated matches within dates, you're still matching some silly things like 99:99-88:88 because \d matches any digit 0-9. You can mix more restrictive character classes into this regex to address that issue. For example, with a 12-hour clock:

For the hour part, use

(1[0-2]|0?[1-9])

instead of

\d{1,2}

For the minute part use

(0[0-9]|[1-5][0-9])

instead of

\d\d

Mixing the more restrictive character classes into the regex yields this nearly impossible to grok and maintain beast:

(?<!\d|-|:)(((1[0-2]|0?[1-9]))(:((0[0-9]|[1-5][0-9])))?-(1[0-2]|0?[1-9])(:((0[0-9]|[1-5][0-9])))?)(?!\d|-|:)

As Java code:

Pattern p = Pattern.compile("(?<!\\d|-|:)(((1[0-2]|0?[1-9]))(:((0[0-9]|[1-5][0-9])))?-(1[0-2]|0?[1-9])(:((0[0-9]|[1-5][0-9])))?)(?!\\d|-|:)");
于 2013-11-05T00:48:21.250 回答
1

简单方法:

((\d{2}(:\d{2})?)-?){2}

更安全;更详细的正则表达式:

([0-1]?[0-9]|[2][0-3])(:([0-5][0-9]))?-([0-1]?[0-9]|[2][0-3])(:([0-5][0-9]))?

实例:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class App {
    private static final String TIME_FORMAT = "%02d:%02d";
    private static final String TIME_RANGE = "([0-1]?[0-9]|[2][0-3])(:([0-5][0-9]))?-([0-1]?[0-9]|[2][0-3])(:([0-5][0-9]))?";

    public static void main(String[] args) {
        String passage = "The time can look like this: 11:15-12:15 or 11-12:15 or 11-12 and so on.";
        Pattern pattern = Pattern.compile(TIME_RANGE);
        Matcher matcher = pattern.matcher(passage);
        int count = 0;

        while (matcher.find()) {
            String time1 = formattedTime(matcher.group(1), matcher.group(3));
            String time2 = formattedTime(matcher.group(4), matcher.group(6));
            System.out.printf("Time #%d: %s - %s\n", count, time1, time2);
            count++;
        }
    }

    private static String formattedTime(String strHour, String strMinute) {
        int intHour = parseInt(strHour);
        int intMinute = parseInt(strMinute);

        return String.format(TIME_FORMAT, intHour, intMinute);
    }

    private static int parseInt(String str) {
        return str != null ? Integer.parseInt(str) : 0;
    }
}

输出:

Time #0: 11:15 - 12:15
Time #1: 11:00 - 12:15
Time #2: 11:00 - 12:00
于 2013-11-04T22:58:39.067 回答