0

我有这个字符串

9 月 1 日起可使用设备齐全的自助式 2 卧室套房,步行 5 分钟即可到达 UVIC。

现在我正在使用预匹配来提取它:这是正则表达式。

'/\bavailable\\s(?P<date_available>[?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?|immediately]+[\\s\d]+)[st|nd|rd|th]?/i'

目前这个正则表达式可以从一个字符串中提取:

Available september 1st.
Available September 2nd
available september 3rd
available september 4th
available sept 1

输出示例是:

Array
(
    [0] => available September 1
    [date_available] => September 1
    [1] => September 1
)

但是当字符串为:

Available for september 1st.
Available in September 2nd
available since september 3rd
available at september 4th

任何人都可以帮我处理这个问题吗?谢谢

4

3 回答 3

1

使用通配符 AZ,2 到 5 个字母(匹配“on”之类的内容):

$regex = '/\bavailable[ ]*(?:[a-z]{2,5})?[ ]*' .
    '(?P<date_available>immediately|now|' .
    '(?:(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?' .
    '|Apr(?:il)?|May|Jun(?:e)|Jul(?:y)?|Aug(?:ust)?' .
    '|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)' .
    '[ ]+[\d]+))' .
    //end <date_available>
    '(?:st|nd|rd|th)?/i';

用法:

$lines = array(
    'Fully furnished self contained 2 bedroom suite just 5 minute walk to UVIC is available now.',
    'bedroom suite just 5 minute walk to UVIC is available on September 34.',
    'bedroom suite just 5 minute walk to somewhere is available on Apr 1.',
    );

foreach ($lines as $line) {
    echo $line, "\n<br>\n";
    if (preg_match($regex, $line, $matches) === 1) {
        print_r($matches['date_available']);
    } else {
        echo "Does not match.";
    }
    echo "\n<br>\n";
}
于 2012-08-23T08:43:37.803 回答
0

我实际上根本无法让您的工作,看起来好像您正在尝试使用带方括号的字符类,[ ]而不是分组和交替使用括号( )

根据您的要求,以下可能是我能得到的最短的

$pattern = '/\bavailable\s+(?:(?:for|in|at|since)\s+)?((?:immediately|now)|(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|June?|July?|Aug(?:ust)?|Oct(?:ober)?|(?:Sept|Nov|Dec)(?:ember)?)\s+?\d{1,2}(?:st|nd|rd|th)?)/i';

这不包括命名的子模式,因为所需的匹配总是存在的,$matches[1]但是如果你想包含一个命名的子模式,那么你总是可以放入一个。

$pattern = '/\bavailable\s+(?:(?:for|in|at|since)\s+)?(?P<date_available>(?:immediately|now)|(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|June?|July?|Aug(?:ust)?|Oct(?:ober)?|(?:Sept|Nov|Dec)(?:ember)?)\s+?\d{1,2}(?:st|nd|rd|th)?)/i';

作为对@EthanB早期解决方案的回应,您似乎没有捕获 date 的序数后缀st, nd, rd, th,如果是这种情况,并且不需要,那么您可以通过不包括它来使其更短,尝试这样做是没有意义的匹配天数之后的任何内容。

于 2012-08-23T13:47:58.840 回答
0

以下适用于您的所有示例,尽管我没有在 PHP 中输​​入您的“命名子模式”,因为我不知道它们的确切语法

\bavailable\s+(?:(?:for|in|at|since)\s+)?((?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|June?|July?|Aug(?:ust)?|Sept(?:ember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\s+\d{1,2}(?:st|nd|rd|th)?)
于 2012-08-23T09:01:32.873 回答