0

我正在处理一个模板类,在尝试从字符串参数列表中解析出带引号的字符串列表时遇到问题。以字符串为例:

$string = 'VAR_SELECTED, \'Hello m\'lady\', "null"';

我想出一个提取字符串“Hello m'lady”和“null”的正则表达式时遇到问题。我得到的最接近的是

$string = 'VAR_SELECTED, \'Hello m\'lady\', "null", \'TE\'ST\'';
preg_match_all('/(?:[^\']|\\\\.)+|(?:[^"]|\\\\.)+/', $string, $matches);
print_r($matches);

哪个输出:

Array
(
    [0] => Array
        (
            [0] => VAR_SELECTED, 
            [1] => 'Hello m'lady', 
            [2] => "null", 
            [3] => 'TE'ST'
        )

)

然而,更复杂的情况是:

$string = 'VAR_SELECTED, \'Hello "Father"\', "Hello \'Luke\'"';
preg_match_all('/(?:[^\']|\\\\.)+|(?:[^"]|\\\\.)+/', $string, $matches);
print_r($matches);  

输出:

Array
(
    [0] => Array
        (
            [0] => VAR_SELECTED, 
            [1] => 'Hello 
            [2] => "Father"
            [3] => ', 
            [4] => "Hello 
            [5] => 'Luke'
            [6] => "
        )

)

谁能帮我解决这个问题?多个正则表达式是前进的方向吗?

编辑也许用占位符替换字符串中的逗号然后用爆炸分解字符串会更容易吗?

编辑 2只是想到了一个简单的不安全选项(我不会使用),但会生成 E_NOTICE 错误。

$string = 'return array(VAR_SELECTED, \'Hello , "Father"\', "Hello \'Luke\'4");';
$string = eval($string);
print_r($string);
4

3 回答 3

3

Try this:

/(?<=^|[\s,])(?:(['"]).*?\1|[^\s,'"]+)(?=[\s,]|$)/

Or, as a PHP single-quoted string literal:

'/(?<=^|[\s,])(?:([\'"]).*?\1|[^\s,\'"]+)(?=[\s,]|$)/'

That regex yields the desired result, but I think you're going about this wrong. Usually, if a quoted string needs to contain a literal quote character, the quote is escaped, either with a backslash or with another quote. You aren't doing that, so I had to use a fragile hack based on lookarounds. Are you sure the data isn't supposed to look like this?

$string = 'VAR_SELECTED, \'Hello m\\'lady\', "null"';

$string = 'VAR_SELECTED, \'Hello "Father"\', "Hello \\'Luke\\'"';

Come to think of it, doesn't PHP have built-in support for CSV data?

于 2010-07-10T18:49:25.153 回答
1

这是我将如何做到的:

将任务分解为您要执行的组件步骤:

1.) 用逗号分解字符串。

For 'VAR_SELECTED, \'Hello m\'lady\', "null"' this gives me
[0]=>"VAR_SELECTED"
[1]=>" \'Hello m\'lady\'"
[2]=>" "null""

For 'VAR_SELECTED, \'Hello "Father"\', "Hello \'Luke\'"' this gives me
[0]=>"VAR_SELECTED"
[1]=>" \'Hello "Father"\'"
[2]=>" "Hello \'Luke\'""

2.) 在所有三个上运行 Trim 以消除任何空白

For 'VAR_SELECTED, \'Hello m\'lady\', "null"' this gives me
[0]=>"VAR_SELECTED"
[1]=>"\'Hello m\'lady\'"
[2]=>""null""

For 'VAR_SELECTED, \'Hello "Father"\', "Hello \'Luke\'"' this gives me
[0]=>"VAR_SELECTED"
[1]=>"\'Hello "Father"\'"
[2]=>""Hello \'Luke\'""

3.) 运行 str_replace(" \ "," ",$text) 去掉斜线。(删除空格..添加只是为了便于阅读,所以应该是一个裸斜杠和一个“空”字符串)

For 'VAR_SELECTED, \'Hello m\'lady\', "null"' this gives me
[0]=>"VAR_SELECTED"
[1]=>"'Hello m'lady'"
[2]=>""null""

For 'VAR_SELECTED, \'Hello "Father"\', "Hello \'Luke\'"' this gives me
[0]=>"VAR_SELECTED"
[1]=>"'Hello "Father"'"
[2]=>""Hello 'Luke'""

4.) 再次运行 trim,仅 trim($text, " ' " ") (删除空格..仅为可读性添加)

For 'VAR_SELECTED, \'Hello m\'lady\', "null"' this gives me
[0]=>"VAR_SELECTED"
[1]=>"Hello m'lady"
[2]=>"null"

For 'VAR_SELECTED, \'Hello "Father"\', "Hello \'Luke\'"' this gives me
[0]=>"VAR_SELECTED"
[1]=>"Hello "Father""
[2]=>"Hello 'Luke'"

我没有对此进行测试,但逻辑是合理的。测试 98% 的所有正则表达式(根据我的经验)的一种快速而肮脏的方法是使用http://rubular.com/ 这是一个很棒的网站。通常,如果它开始在正则表达式上阻塞,这是我应该进一步分解问题的第一个迹象。(这只是意见~穿上防火服~)

于 2010-07-10T17:30:41.203 回答
0

您想在匹配字符串中使用反向引用。

preg_match_all('@([\'"]).*[^\\\\]\1@', $string, $matches);

这将从 " 或 ' 的第一个实例开始匹配,然后匹配以匹配 " 或 ' 结尾且未转义的最长字符串。

Array (
[0] => Array
    (
        [0] => 'Hello m'lady', "null", 'TE'ST'
    )

[1] => Array
    (
        [0] => '
    )
于 2010-07-10T17:15:58.223 回答