php - 通过正则表达式搜索字符串？有更好的方法吗？

Question

我想在 php 中搜索一个文本文件以获取特殊条件：当第一次匹配字符串时，开始收集文本，当第二次匹配相同字符串时，停止收集文本。

例如。如果字符串是“世界”并在下面的行中搜索：“我们的世界有 196 个国家，但其中只有 192 个是联合国成员。我们的世界是非凡的。

然后我想要这个文本：'有196个国家，但其中只有192个是联合国的成员。我们的'在匹配的变量中。

我在 preg_match() 中尝试了很多正则表达式，但没有结果，有没有更好的方法呢？

谢谢

score 2 · Accepted Answer

使用后视和前瞻：

/(?<=world ).*?(?= world)/

在此处查看实际操作：http ://regex101.com/r/tW2bT8

...这是一个使用 PHP 的演示： http: //codepad.viper-7.com/DucTKE

score 0 · Accepted Answer

$lines = file($filename);
$keep = false;
$keepTrailing = true; //Flag that decides wether to keep trailing capture segments or not.
$extractions = array();
$current = '';
foreach($lines as $line){
    $parts = preg_split('/\bworld\b/i', $line);
    $current .= $parts[0];
    for ($i = 1; $i<count($parts); $i++){
        if ($keep) $extractions[] = $current;
        $keep = !$keep;
        $current = $parts[$i];
    }
}
if ($keep && $keepTrailing)
    $extractions[] = $current;
var_dump($extractions);

这是在行动。

基本上，通过遍历文件一次，您可以简单地拆分关键字（“世界”）上的每一行 - 我使用\b锚来确保它不会拆分为“世俗”或其他垃圾。我已经包含一个标志来决定是否保留尾随捕获段。你不一定需要那个，但它可能会有所帮助。该解决方案唯一不直观的部分是将当前捕获保存在一个$current变量中，这基本上允许您跨多个换行符继续扫描。

你知道，这很容易变成一个函数。

function capturingSearchWithKeyword($filename, $keyword, $keepTrailing = true, $trim = false){
    $lines = file($filename);
    $keep = false;
    $extractions = array();
    $current = '';
    foreach($lines as $line){
        $parts = preg_split("/\\b$keyword\\b/i", $line);
        $current .= $parts[0];
        for ($i = 1; $i<count($parts); $i++){
            if ($keep){
                if ($trim) $current = trim($current);
                $extractions[] = $current;
            }
            $keep = !$keep;
            $current = $parts[$i];
        }
    }
    if ($keep && $keepTrailing)
        $extractions[] = $current;
    return $extractions
}

一探究竟...

php - 通过正则表达式搜索字符串？有更好的方法吗？

2 回答 2

Related

Reference