php - 以能够识别自定义 if 语句的方式解析纯文本

Question

我有以下字符串：

$string = "The man has {NUM_DOGS} dogs."

我通过以下函数运行它来解析它：

function parse_text($string)
{
    global $num_dogs;

    $string = str_replace('{NUM_DOGS}', $num_dogs, $string);

    return $string;
}

parse_text($string);

$num_dogs预设变量在哪里。取决于$num_dogs，这可能会返回以下任何字符串：

该男子有1条狗。
该男子有2条狗。
这个人有500条狗。

问题在于，在“男人有 1 条狗”的情况下，狗是复数形式，这是不受欢迎的。我知道这可以通过不使用该parse_text函数来解决，而是执行以下操作：

if($num_dogs = 1){
    $string = "The man has 1 dog.";
}else{
    $string = "The man has $num_dogs dogs.";
}

但在我的应用程序中，我不仅仅解析{NUM_DOGS}，而且编写所有条件需要很多行。

我需要一种速记方式，可以将其写入$string可以通过解析器运行的初始值，理想情况下，这不会将我限制在两个真/假的可能性中。

例如，让

$string = 'The man has {NUM_DOGS} [{NUM_DOGS}|0=>"dogs",1=>"dog called fred",2=>"dogs called fred and harry",3=>"dogs called fred, harry and buster"].';

清楚最后发生了什么吗？我尝试使用竖线之后的方括号内的部分开始创建数组，然后将新数组的键与 {NUM_DOGS} 的解析值进行比较（现在将是 $num_dogs 变量在垂直条的左侧），并使用该键返回数组条目的值。

如果这不完全令人困惑，是否可以使用 preg_* 函数？

score 12 · Accepted Answer

你的问题的前提是你想匹配一个特定的模式，然后在对匹配的文本执行额外的处理后替换它。

似乎是一个理想的候选人preg_replace_callback

用于捕获匹配的括号、引号、大括号等的正则表达式可能会变得相当复杂，而使用正则表达式来完成这一切实际上是非常低效的。事实上，如果你需要的话，你需要编写一个合适的解析器。

对于这个问题，我将假设复杂程度有限，并使用正则表达式通过两阶段解析来解决它。

首先，我能想到的最简单的正则表达式用于捕获花括号之间的标记。

/{([^}]+)}/

让我们分解一下。

{        # A literal opening brace
(        # Begin capture
  [^}]+  # Everything that's not a closing brace (one or more times)
)        # End capture
}        # Literal closing brace

当应用于字符串时preg_match_all，结果如下所示：

array (
  0 => array (
    0 => 'A string {TOK_ONE}',
    1 => ' with {TOK_TWO|0=>"no", 1=>"one", 2=>"two"}',
  ),
  1 => array (
    0 => 'TOK_ONE',
    1 => 'TOK_TWO|0=>"no", 1=>"one", 2=>"two"',
  ),
)

到目前为止看起来不错。

请注意，如果您的字符串中有嵌套的大括号，即{TOK_TWO|0=>"hi {x} y"}，此正则表达式将不起作用。如果这不是问题，请跳到下一部分。

可以进行顶级匹配，但我能够做到的唯一方法是通过递归。大多数正则表达式老手会告诉您，一旦您将递归添加到正则表达式，它就不再是正则表达式。

这是额外的处理复杂性开始的地方，并且对于长而复杂的字符串，很容易耗尽堆栈空间并使程序崩溃。如果您需要使用它，请小心使用它。

递归正则表达式取自我的其他答案之一并稍作修改。

`/{((?:[^{}]*|(?R))*)}/`

坏掉了。

{                   # literal brace
(                   # begin capture
    (?:             # don't create another capture set
        [^{}]*      # everything not a brace
        |(?R)       # OR recurse
    )*              # none or more times
)                   # end capture
}                   # literal brace

而这次输出只匹配顶级大括号

array (
  0 => array (
    0 => '{TOK_ONE|0=>"a {nested} brace"}',
  ),
  1 => array (
    0 => 'TOK_ONE|0=>"a {nested} brace"',
  ),
)

同样，除非必须，否则不要使用递归正则表达式。（如果你的系统有旧的 PCRE 库，你的系统甚至可能不支持它们）

有了这个，我们需要确定令牌是否具有与之关联的选项。我建议根据我的示例保留带有令牌的选项，而不是根据您的问题匹配两个片段。{TOKEN|0=>"option"}

让我们假设$match包含一个匹配的标记，如果我们检查一个管道|，并在它之后获取所有内容的子字符串，我们将留下您的选项列表，我们可以再次使用正则表达式来解析它们。（别担心，我会把所有东西放在最后）

/(\d)+\s*=>\s*"([^"]*)",?/

坏掉了。

(\d)+    # Capture one or more decimal digits
\s*      # Any amount of whitespace (allows you to do 0    =>    "")
=>       # Literal pointy arrow
\s*      # Any amount of whitespace
"        # Literal quote
([^"]*)  # Capture anything that isn't a quote
"        # Literal quote
,?       # Maybe followed by a comma

和一个示例匹配

array (
  0 => array (
    0 => '0=>"no",',
    1 => '1 => "one",',
    2 => '2=>"two"',
  ),
  1 => array (
    0 => '0',
    1 => '1',
    2 => '2',
  ),
  2 => array (
    0 => 'no',
    1 => 'one',
    2 => 'two',
  ),
)

如果要在引号内使用引号，则必须为其制作自己的递归正则表达式。

总结一下，这是一个工作示例。

一些初始化代码。

$options = array(
    'WERE' => 1,
    'TYPE' => 'cat',
    'PLURAL' => 1,
    'NAME' => 2
);

$string = 'There {WERE|0=>"was a",1=>"were"} ' .
    '{TYPE}{PLURAL|1=>"s"} named bob' . 
    '{NAME|1=>" and bib",2=>" and alice"}';

一切都在一起。

$string = preg_replace_callback('/{([^}]+)}/', function($match) use ($options) {
    $match = $match[1];

    if (false !== $pipe = strpos($match, '|')) {
        $tokens = substr($match, $pipe + 1);
        $match = substr($match, 0, $pipe);
    } else {
        $tokens = array();
    }

    if (isset($options[$match])) {
        if ($tokens) {
            preg_match_all('/(\d)+\s*=>\s*"([^"]*)",?/', $tokens, $tokens);

            $tokens = array_combine($tokens[1], $tokens[2]);

            return $tokens[$options[$match]];
        }
        return $options[$match];
    }
    return '';
}, $string);

请注意，错误检查很少，如果您选择不存在的选项，将会出现意外结果。

可能有更简单的方法来完成所有这些，但我只是接受了这个想法并实施了它。

score 6 · Accepted Answer

首先，它有点值得商榷，但如果你可以轻松避免它，只需将$num_dogs其作为参数传递给函数，因为大多数人认为全局变量是邪恶的！

接下来，为了获得“s”，我通常会这样做：

$dogs_plural = ($num_dogs == 1) ? '' : 's';

然后做这样的事情：

$your_string = "The man has $num_dogs dog$dogs_plural";

它本质上与执行 if/else 块相同，但代码行更少，您只需编写一次文本。

至于另一部分，我仍然对你想要做的事情感到困惑，但我相信你正在寻找某种方式来转换

{NUM_DOGS}|0=>"dogs",1=>"dog called fred",2=>"dogs called fred and harry",3=>"dogs called fred, harry and buster"]

进入：

switch $num_dogs {
    case 0:
        return 'dogs';
        break;
    case 1:
        return 'dog called fred';
        break;
    case 2:
        return 'dogs called fred and harry';
        break;
    case 3:
        return 'dogs called fred, harry and buster';
        break;
}

最简单的方法是尝试使用explode()和正则表达式的组合，然后让它做我上面所说的事情。

score 6 · Accepted Answer

在紧要关头，我做了一些类似于你所要求的事情，实现方式有点像下面的代码。

这远没有@Mike 的答案那么丰富，但它在过去已经成功了。

/**
 * This function pluralizes words, as appropriate.
 *
 * It is a completely naive, example-only implementation.
 * There are existing "inflector" implementations that do this
 * quite well for many/most *English* words.
 */
function pluralize($count, $word)
{
    if ($count === 1)
    {
        return $word;
    }
    return $word . 's';
}

/**
 * Matches template patterns in the following forms:
 *   {NAME}       - Replaces {NAME} with value from $values['NAME']
 *   {NAME:word}  - Replaces {NAME:word} with 'word', pluralized using the pluralize() function above.
 */
function parse($template, array $values)
{
    $callback = function ($matches) use ($values) {
        $number = $values[$matches['name']];
        if (array_key_exists('word', $matches)) {
            return pluralize($number, $matches['word']);
        }
        return $number;
    };

    $pattern = '/\{(?<name>.+?)(:(?<word>.+?))?\}/i';
    return preg_replace_callback($pattern, $callback, $template);
}

这里有一些类似于你原来的问题的例子......

echo parse(
    'The man has {NUM_DOGS} {NUM_DOGS:dog}.' . PHP_EOL,
    array('NUM_DOGS' => 2)
);

echo parse(
    'The man has {NUM_DOGS} {NUM_DOGS:dog}.' . PHP_EOL,
    array('NUM_DOGS' => 1)
);

输出是：

该男子有2条狗。

该男子有 1 条狗。

值得一提的是，在大型项目中，我总是放弃任何自定义滚动变形，转而支持GNU gettext，这似乎是一旦需要多语言的最明智的前进方式。

score 0 · Accepted Answer

这是从 2009 年 flussence 为回答这个问题而发布的答案中复制的：

您可能想查看gettext 扩展名。更具体地说，它听起来ngettext()会做你想做的事：只要你有一个数字可以计数，它就会正确地复数单词。

print ngettext('odor', 'odors', 1); // prints "odor"
print ngettext('odor', 'odors', 4); // prints "odors"
print ngettext('%d cat', '%d cats', 4); // prints "4 cats"

您还可以让它正确处理翻译的复数形式，这是它的主要目的，尽管它需要做很多额外的工作。

php - 以能够识别自定义 if 语句的方式解析纯文本

4 回答 4

Related

Reference