10

我正在寻找一种简单的方法来查找 PHP 中两个字符串的匹配部分(特别是在 URI 的上下文中)

例如,考虑两个字符串:

http://2.2.2.2/~machinehost/deployment_folder/

/~machinehost/deployment_folder/users/bob/settings

我需要的是从第二个字符串中切掉这两个字符串的匹配部分,结果是:

用户/鲍勃/设置

在附加第一个字符串作为前缀之前,形成一个绝对 URI。

是否有一些简单的方法(在 PHP 中)来比较两个任意字符串以匹配其中的子字符串?

编辑:正如所指出的,我的意思是两个字符串共有的最长匹配子字符串

4

5 回答 5

2

假设你的字符串分别是$a$b,你可以使用这个:

$a = 'http://2.2.2.2/~machinehost/deployment_folder/';
$b = '/~machinehost/deployment_folder/users/bob/settings';

$len_a = strlen($a);
$len_b = strlen($b);

for ($p = max(0, $len_a - $len_b); $p < $len_b; $p++)
    if (substr($a, $len_a - ($len_b - $p)) == substr($b, 0, $len_b - $p))
        break;

$result = $a.substr($b, $len_b - $p);

echo $result;

这个结果是http://2.2.2.2/~machinehost/deployment_folder/users/bob/settings

于 2010-11-25T23:56:19.010 回答
1

查找最长的共同匹配也可以使用正则表达式完成。

下面的函数将采用两个字符串,使用一个来创建一个正则表达式,并针对另一个执行它。

/**
 * Determine the longest common match within two strings
 *
 * @param string $str1
 * @param string $str2 Two strings in any order.
 * @param boolean $case_sensitive Set to true to force
 * case sensitivity. Default: false (case insensitive).
 * @return string The longest string - first match.
 */
function get_longest_common_subsequence( $str1, $str2, $case_sensitive = false ) {
    // First check to see if one string is the same as the other.
    if ( $str1 === $str2 ) return $str1;
    if ( ! $case_sensitive && strtolower( $str1 ) === strtolower( $str2 ) ) return $str1;

    // We'll use '#' as our regex delimiter. Any character can be used as we'll quote the string anyway,
    $delimiter = '#';

    // We'll find the shortest string and use that to check substrings and create our regex.
    $l1 = strlen( $str1 );
    $l2 = strlen( $str2 );
    $str = $l1 <= $l2 ? $str1 : $str2;
    $str2 = $l1 <= $l2 ? $str2 : $str1;
    $l = min( $l1, $l2 );

    // Next check to see if one string is a substring of the other.
    if ( $case_sensitive ) {
        if ( strpos( $str2, $str ) !== false ) {
            return $str;
        }
    }
    else {
        if ( stripos( $str2, $str ) !== false ) {
            return $str;
        }
    }

    // Regex for each character will be of the format (?:a(?=b))?
    // We also need to capture the last character, but this prevents us from matching strings with a single character. (?:.|c)?
    $reg = $delimiter;
    for ( $i = 0; $i < $l; $i++ ) {
        $a = preg_quote( $str[ $i ], $delimiter );
        $b = $i + 1 < $l ? preg_quote( $str[ $i + 1 ], $delimiter ) : false;
        $reg .= sprintf( $b !== false ? '(?:%s(?=%s))?' : '(?:.|%s)?', $a, $b );
    }
    $reg .= $delimiter;
    if ( ! $case_sensitive ) {
        $reg .= 'i';
    }
    // Resulting example regex from a string 'abbc':
    // '#(?:a(?=b))?(?:b(?=b))?(?:b(?=c))?(?:.|c)?#i';

    // Perform our regex on the remaining string
    $str = $l1 <= $l2 ? $str2 : $str1;
    if ( preg_match_all( $reg, $str, $matches ) ) {
        // $matches is an array with a single array with all the matches.
        return array_reduce( $matches[0], function( $a, $b ) {
            $al = strlen( $a );
            $bl = strlen( $b );
            // Return the longest string, as long as it's not a single character.
            return $al >= $bl || $bl <= 1 ? $a : $b;
        }, '' );
    }

    // No match - Return an empty string.
    return '';
}

它将使用两个字符串中较短的一个生成一个正则表达式,尽管性能很可能是相同的。它可能会错误地将字符串与重复出现的子字符串匹配,并且我们仅限于匹配两个或更多字符的字符串,除非它们相等或一个是另一个的子字符串。例如:

// Works as intended.
get_longest_common_subsequence( 'abbc', 'abc' ) === 'ab';

// Returns incorrect substring based on string length and recurring substrings.
get_longest_common_subsequence( 'abbc', 'abcdef' ) === 'abc';

// Does not return any matches, as all recurring strings are only a single character long.
get_longest_common_subsequence( 'abc', 'ace' ) === '';

// One of the strings is a substring of the other.
get_longest_common_subsequence( 'abc', 'a' ) === 'a';

无论如何,它使用另一种方法运行,并且可以改进正则表达式以解决其他情况。

于 2017-12-15T03:37:57.503 回答
0

我不确定是否理解您的全部要求,但想法是:

让 A 成为您的 URL,B 成为您的“/~machinehost/deployment_folder/users/bob/settings”

  • 在 A 中搜索 B -> 你得到一个索引 i(其中 i 是 A 中 B 的第一个 / 的位置)
  • 让 l = 长度(A)
  • 您需要将 B 从 (li) 剪切到长度 (B) 以获取 B 的最后一部分 (/users/bob/settings)

我还没有测试过,但如果你真的需要,我可以帮助你使这个出色的(讽刺的)解决方案发挥作用。

请注意,可能使用正则表达式,例如

$pattern = "$B(.*?)"
$res = array();
preg_match_all($pattern, $A, $res);

编辑:我认为您的最后评论使我的回复无效。但是你想要的是找到子字符串。所以你可以首先从一个繁重的算法开始,尝试在 A 中找到 B[1:i] for i in {2, length(B)},然后使用一些动态编程的东西。

于 2010-11-25T23:55:48.443 回答
0

对于您的要求,它似乎不是开箱即用的代码。所以让我们寻找一个简单的方法。

在本练习中,我使用了两种方法,一种用于查找最长匹配,另一种用于切断匹配部分。

FindLongestMatch ()方法拆开一条路径,一块一块地在另一条路径中寻找匹配,只保留一个匹配,最长的一个(无数组,无排序)。RemoveLongestMatch()方法在找到的最长匹配位置之后采用后缀或“余数”。

这里是完整的源代码:

<?php

function FindLongestMatch($relativePath, $absolutePath)
{
    static $_separator = '/';
    $splitted = array_reverse(explode($_separator, $absolutePath));

    foreach ($splitted as &$value)
    {
        $matchTest = $value.$_separator.$match;
        if(IsSubstring($relativePath, $matchTest))
            $match = $matchTest;

        if (!empty($value) && IsNewMatchLonger($match, $longestMatch))
            $longestMatch = $match;
    }

    return $longestMatch;
}

//Removes from the first string the longest match.
function RemoveLongestMatch($relativePath, $absolutePath)
{
    $match = findLongestMatch($relativePath, $absolutePath);
    $positionFound = strpos($relativePath, $match);     
    $suffix = substr($relativePath, $positionFound + strlen($match));

    return $suffix;
}

function IsNewMatchLonger($match, $longestMatch)
{
    return strlen($match) > strlen($longestMatch);
}

function IsSubstring($string, $subString)
{
    return strpos($string, $subString) > 0;
}

这是测试用例的代表性子集:

//TEST CASES
echo "<br>-----------------------------------------------------------"; 
echo "<br>".$absolutePath = 'http://2.2.2.2/~machinehost/deployment_folder/';
echo "<br>".$relativePath = '/~machinehost/deployment_folder/users/bob/settings';
echo "<br>Longest match: ".findLongestMatch($relativePath, $absolutePath);
echo "<br>Suffix: ".removeLongestMatch($relativePath, $absolutePath);

echo "<br>-----------------------------------------------------------"; 
echo "<br>".$absolutePath = 'http://1.1.1.1/root/~machinehost/deployment_folder/';
echo "<br>".$relativePath = '/root/~machinehost/deployment_folder/users/bob/settings';
echo "<br>Longest match: ".findLongestMatch($relativePath, $absolutePath);
echo "<br>Suffix: ".removeLongestMatch($relativePath, $absolutePath);

echo "<br>-----------------------------------------------------------"; 
echo "<br>".$absolutePath = 'http://2.2.2.2/~machinehost/deployment_folder/users/';
echo "<br>".$relativePath = '/~machinehost/deployment_folder/users/bob/settings';
echo "<br>Longest match: ".findLongestMatch($relativePath, $absolutePath);
echo "<br>Suffix: ".removeLongestMatch($relativePath, $absolutePath);

echo "<br>-----------------------------------------------------------"; 
echo "<br>".$absolutePath = 'http://3.3.3.3/~machinehost/~machinehost/subDirectory/deployment_folder/';
echo "<br>".$relativePath = '/~machinehost/subDirectory/deployment_folderX/users/bob/settings';
echo "<br>Longest match: ".findLongestMatch($relativePath, $absolutePath);
echo "<br>Suffix: ".removeLongestMatch($relativePath, $absolutePath);

运行以前的测试用例提供以下输出:

http://2.2.2.2/~machinehost/deployment_folder/
/~machinehost/deployment_folder/users/bob/settings
Longuest match: ~machinehost/deployment_folder/
Suffix: users/bob/settings

http://1.1.1.1/root/~machinehost/deployment_folder/
/root/~machinehost/deployment_folder/users/bob/settings
Longuest match: root/~machinehost/deployment_folder/
Suffix: users/bob/settings

http://2.2.2.2/~machinehost/deployment_folder/users/
/~machinehost/deployment_folder/users/bob/settings
Longuest match: ~machinehost/deployment_folder/users/
Suffix: bob/settings

http://3.3.3.3/~machinehost/~machinehost/subDirectory/deployment_folder/
/~machinehost/subDirectory/deployment_folderX/users/bob/settings
Longuest match: ~machinehost/subDirectory/
Suffix: deployment_folderX/users/bob/settings

也许你可以把这段代码的想法变成对你当前项目有用的东西。让我知道它是否也对您有用。顺便说一句,oreX 先生的回答看起来也不错。

于 2010-11-26T03:55:17.287 回答
-1

尝试这个。

http://pastebin.com/GqS3UiPD

于 2010-11-26T00:12:28.857 回答