php - 为什么这个正则表达式只捕获一个单词？

Question

我正在尝试学习正则表达式。我知道基础知识，而且我在正则表达式方面并不糟糕，我只是不是专业人士 - 因此我有一个问题要问你们。如果你知道正则表达式，我敢打赌它会很简单。

我目前得到的是这样的：

/(\w+)\s-{1}\s(\w+)\.{1}(\w{3,4})/

我正在尝试为自己创建一个小脚本，通过格式化所有文件名来整理我的音乐收藏。我知道那里已经有其他东西了，但这对我来说是一次学习经历。我已经把所有的标题都搞砸了，用“Hell Aint a Bad Place To Be”代替“Hell Aint A Bad Place To Be”之类的东西。以我的智慧，我以某种方式结束了“Hell Aint a ad Place to be”（我正在寻找一个 A 后跟一个空格和一个大写字符）。显然，这是一场噩梦，必须手动完成。不用说我现在首先测试样品。

无论如何，上面的正则表达式是许多中的第一阶段。最终我想建立它，但现在我只需要让简单的部分工作。

最后我想转：

"arctic Monkeys- a fake tales of a san francisco"

进入

"Arctic Monkeys - A Fake Tales of a San Francisco"

我知道当你在'-'之后我需要后向断言来抓取，因为如果第一个单词是'a'，'of'等，我通常会小写，我需要大写它们（上面是我知道的这个用例的一个坏例子）。

修复现有正则表达式的任何方法都会很棒，并且关于在哪里查看我的备忘单以完成其余部分的提示会很棒（我不是在寻找一个成熟的答案，因为我需要学习做我自己，我就是想不通为什么 w+ 只得到一个字）。

score 1 · Accepted Answer

\w 不包含空格。一个有效的正则表达式可能是：

/^(.+?)\s*-\s*(.+)$/

解释：

^     - must start at the beginning of the string
(.+?) - match any character, be ungreedy
\s*   - match any number whitespace that might exists (including none)
-     - match character
\s*   - any whitespace again
(.+)  - remaining characters
$     - end of string

然后转码将在另一个替换正则表达式中发生。

score 1 · Accepted Answer

我对你在做什么有点困惑，但也许这会有所帮助。请记住，+ 是 1 个或更多字符，* 是 0 个或更多。因此，您可能想要执行 ([\s]*) 之类的操作来匹配空格。您无需在单个字符旁边指定 {1}。

所以也许是这样的：

([\w\s]+)([\s]*)-([\s]*)([\w\s]+)\.([\w]{3,4})

我没有测试过这段代码，但我想你明白了。

score 1 · Accepted Answer

我相信有一种更简单的方法可以解决这个问题：基于更简单的正则表达式将字符串拆分为单词，然后对这些单词进行任何处理。这将允许您以更简洁的方式对文本执行更复杂的转换。这是一个例子：

<?php

$song = "arctic Monkeys- a fake tales of a san francisco";

// Split on spaces or - (the - is still present
// because it's only a lookahead match)
$words = preg_split("/([\s]+|(?=-))/", $song);

/*
Output for print_r:
Array
(
    [0] => arctic
    [1] => Monkeys
    [2] => -
    [3] => a
    [4] => fake
    [5] => tales
    [6] => of
    [7] => a
    [8] => san
    [9] => francisco
)
*/
print_r($words);

$new_words = array();
foreach ($words as $k => $word) {
        $new_words[] = processWord($word, $k, $words);
}

// This will output:
// Arctic Monkeys - A Fake Tales of a San Francisco
echo implode(' ', $new_words);

// You can add as many processing rules you want in here - in a very clean way
function processWord($word, $idx, $words) {
        if ($words[$idx - 1] == '-') return ucfirst($word);
        return strlen($word) > 2 ? ucfirst($word) : $word;
}

这是此代码运行的示例： http: //codepad.org/t6pc8WpR

score 0 · Accepted Answer

对于第一部分，\w 不匹配单词，它匹配单词字符。它等价于 [A-Za-z0-9_]。

相反，尝试 ([A-Za-z0-9_ ]+) 作为你的第一位（在匹配方括号内有一个额外的空格并删除了 \s.

score 0 · Accepted Answer

这是我所拥有的：

<?php
/**
 * Formats a string into a title:
 * * Pads all dashes with spaces.
 * * Uppercase all words with 3 letters or more.
 * * Uppercase first word and first words after dashes.
 *
 * @param $str
 *
 * @return string
 */
function format_title($str) {
    //Remove all spaces before and after dashes.
    //(These will return in the final product)
    $str = preg_replace("/\s?-\s?/", "-", $str);

    //Explode by dash.
    $string_split_by_dash = explode("-", $str);
    //For each sentence (separated by dashes)
    foreach ($string_split_by_dash as &$sentence) {
        //Uppercase all words.
        $sentence = ucwords($sentence);
        //Explode into words (by space)
        $words = explode(" ", $sentence);
        //For each word
        foreach ($words as &$word) {
            //If its length is smaller than 3
            if (strlen($word) < 3) {
                //Lowercase it.
                $word = strtolower($word);
            }
        }
        //Implode back into a sentence.
        $sentence = implode(" ", $words);
        //Uppercase the first word, regardless of length.
        $sentence = ucfirst($sentence);
    }

    //Implode all sentances back by space-padded dash.
    $str = implode(" - ", $string_split_by_dash);

    return $str;
}

$str = "arctic Monkeys- a fake tales of a san francisco";
var_dump(format_title($str));

我认为它比正则表达式更具可读性（并且更可记录）。可能也更有效率，（没有检查）。

php - 为什么这个正则表达式只捕获一个单词？

5 回答 5

Related

Reference