0

在 PHP 中,给定一段长文本,例如:

Ms. Kane, who was elected attorney general last year and has been mentioned as a possible future candidate for governor, struck a political note in her brief announcement to an audience that cheered and applauded her decision.

“我是这样看的,州长会没事的,”她说。她补充说,她想知道谁将代表“戴夫和罗比,谁代表艾米丽和艾米?”</p>

“作为司法部长,”她说,“我选择你。”</p>

我想提取所有引用的材料,在这种情况下是一个包​​含这些结果的数组:

"I looked at it this way, the governor’s going to be O.K.,"
"the Daves and Robbies, who represents the Emilys and Amys?"
"As attorney general,"
"I choose you."

假设:

  • 总会有一个匹配的开盘价和收盘价
  • 简单的双引号

如果您还确保它可以处理弯引号、单引号和其他特殊情况,则可以加分,但如果这样更容易,请随意假设纯双引号。

是的 - 我已经在网站上搜索了答案,虽然有些东西看起来很有帮助,但我没有找到任何有用的东西。最接近的是这个但没有骰子:

preg_match_all('/"([^"]*(?:\\"[^"]*)*)"/', $content, $matches)
4

5 回答 5

1

可能会尝试 PHP拆分字符串。.

伪代码:

将所有内容拆分为一个数组,使用 " 作为拆分参数,然后使用 % (模数 2) 仅选择字符串数组中的“中间”文本。要获取卷曲等,只需先将所有实例转换为直引号。

于 2013-07-11T21:49:33.367 回答
1
$string = 'Ms. Kane, who was elected attorney general last year and has been mentioned as a possible future candidate for governor, struck a political note in her brief announcement to an audience that cheered and applauded her decision.

“I looked at it this way, the governor’s going to be O.K.,” she said. She wondered, she added, who would represent “the Daves and Robbies, who represents the Emilys and Amys?”

“As attorney general,” she said, “I choose you.”';

// Normalize quotes
$search = array("\xe2\x80\x9c", "\xe2\x80\x9d", "\xe2\x80\x98", "\xe2\x80\x99"); 
$replace = array('"', '"', "'", "'");
$newstring = str_replace($search, $replace, $string);

// Extract text
$regex = "/\"(.*)\"/U";  
preg_match_all ($regex, $newstring, $output);  

if(isset($output[1])) {
    print_r($output[1]);
} else {
    echo $newstring;
}

应该给

Array
(
    [0] => I looked at it this way, the governor's going to be O.K.,
    [1] => the Daves and Robbies, who represents the Emilys and Amys?
    [2] => As attorney general,
    [3] => I choose you.
)
于 2013-07-11T21:50:18.917 回答
1

你可以用这个......

$matches = array();
preg_match_all('/(\“.*\”)/U', str_replace("\n", " ", $str), $matches);
print_r($matches);

请注意,我正在删除换行符,因此它将给出引号从一行开始并在另一行结束的匹配项。

于 2013-07-11T21:55:17.167 回答
1

一种最简单但不是最好的方法是使用 strstr() 找到 " 的出现,然后使用 substr() 剪切字符串。

$string = 'Your long text "with quotation"';

$occur = strpos($string, '"'); // the frst occurence of "
$occur2 = strpos($string, '"', $occur + 1); // second occurence of "

$start = $occur; // the start for cut text
$lenght = $occur2 - $occur + 1; // lenght of all quoted text for cut

$res = substr($string, $start, $lenght); // Your quoted text here ex: "with quotation"

您可以将其插入到循环中以获取多个引用的文本:

   $string = 'Your long text "with quotation" Another long text "and text with quotation"';

    $occur2 = 0; // for doing the first search from begin
    $resString = ''; // if you wont string and not array
    $res = array();
    $end = strripos($string, '"'); // find the last occurence for exit loop

    while(true){
        $occur = strpos($string, '"', $occur2); // after $occur2 change his value for find next occur
        $occur2 = strpos($string, '"', $occur + 1);

        $start = $occur;
        $lenght = $occur2 - $occur + 1;

        $res[] = substr($string, $start, $lenght); // $res may be array
        $resString .= substr($string, $start, $lenght); // or string with concat

        if($end == $occur2)
            break; // brak if is the last occurence

        $occur2++; // increment for search next
    }


    echo $resString .'<br>';
    exit(print_r($res));

结果:

 "with quotation""and text with quotation"
 or
 Array ( [0] => "with quotation" [1] => "and text with quotation" )

不使用正则表达式的简单方法,希望对某人有所帮助:)(抱歉英语不好)

于 2013-07-11T22:20:09.473 回答
1

你可以这样做:

<meta charset="UTF-8" />
<pre>
<?php
$pattern = '~(?|"((?>[^"]++|(?<=\\")")*)"|“((?>[^”]++|(?<=\\”)”)*)”)~u';

$text = <<<LOD
Ms. Kane, who was elected attorney general last year and has been mentioned as a possible future candidate for governor, struck a political note in her brief announcement to an audience that cheered and applauded her decision.

“I looked at it this way, the governor’s going to be O.K.,” she said. She wondered, she added, who would represent “the Daves and Robbies, who represents the Emilys and Amys?”

“As attorney general,” she said, “I choose you.”
LOD;

preg_match_all ($pattern, $text, $matches);
print_r($matches[1]);

由于您使用 unicode 字符,因此您必须在模式末尾添加 u 修饰符。

您可以以相同的方式轻松地将您想要的内容添加到模式中,例如使用简单的引号:

$pattern = '~(?|"((?>[^"]++|(?<=\\")")*)"|“((?>[^”]++|(?<=\\”)”)*)”|\'((?>[^\']++|(?<=\\\')\')*)\')~u';

请注意,结构始终相同:

(?|
    "((?>[^"]++|(?<=\\")")*)"
  |
    “((?>[^”]++|(?<=\\”)”)*)”
  |
    \'((?>[^\']++|(?<=\\\')\')*)\'
)
于 2013-07-12T00:26:28.387 回答