php - PHP 字符串差异和动态限制

Question

例A【简化版】：------------------------------------------ --------------------------

MODEL : 字符串{1}和 keeeeeep 在{2}和 oooon...

CASE_A :两个单词上的字符串hello和 keeeeeep和 oooon...

CASE_B :把其他任何东西串起来，然后继续开，只是为了好玩和 oooon...

我需要得到一个列表，其中包含n 个名为$v1, $v2, $vn... 的变量及其各自的匹配值：

编辑：
请注意，变量名称是根据占位符给出的。占位符始终是 INT。（数字只是索引而不是字数）

对于案例A：
$v1=你好
$v2=两个词
$vn=等...

对于案例B：
$v1=其他任何东西
$v2=只是为了好玩
$vn=等...

您可以看到获取这些值的参考是两个字符串的“常量”部分。

示例 B [几乎是真实的]：---------------------------------------- ----------------------------

现在我们假设每个可能的匹配都保存在一个数组中（实际情况是一个长数据库），像这样：

possible_matches {
[0] string three <<<<
[1] Oneword <<<<
[2] Other stuff
[3 ] Harry Poppotter
[4]两个字<<<<
[5]魔语魔幻情怀
}

在前面的例子中没有必要，因为每个{n} “占位符”被“常量”字符串分隔。但是在某些情况下，这些“占位符”在一起......所以我必须发明一种新方法来匹配可能的匹配项（固定列表）。

字符串{1}和 keeeeeep 在{2} {3}和 oooon...

字符串Oneword和 keeeeeep两个单词 字符串 3和 oooon...

如您所见（基于上面显示的数组），结果应该是：
$v1= hello
$v2 =两个词
$v3=字符串三

但是 PHP 怎么知道我希望我的字符串如何分隔？

我的想法是做下一个：

1）将 {2}{3} 块作为一个单独的块。
2）如果这个块（两个字和三个字）是in_array()
3）如果不是：
4）删除它的最后一个字
5）再次检查新的（两个字和三个字）。
6）如果不是：
<<<<
4'）删除它的最后一个单词
5'）再次检查新的（两个字和）。
4'') 删除它的最后一个单词
5'') 再次检查新的（两个单词）。
<<<<
7) 重复 4 和 5 直到它是一个可能的匹配项( in_array())
8) 匹配的一个将是{2}而字符串的其余部分将是{3}

我的问题：我怎样才能做到这一点，在 PHP ?
我试图以最简单的方式解释它，我希望你能理解我想要问什么。如果有人需要更多示例，我会将它们写下来，请告诉我。谢谢阅读。

编辑 - - - - - - - - - - - - - - - - - - - - - - - - - -------------------
一个真实的例子：

Array: possible_matches{
[0]克里斯托弗·约翰逊
[1]麦坎德利斯
[2]电影院
[3]明天晚上
}

模型：我的名字是{1} {2}，我要去{3}{4}

案例: 我的名字是克里斯托弗约翰逊 麦坎德利斯, 我明天晚上要去电影院 .

期望的结果：
$v1= Christopher Johnson
$v2 = McCandless
$v3 =电影院
$v4=明天晚上

创建可能的组合数组

function get_possible_groups($string_of_words, $groups_count){
$words=explode(' ',$string_of_words);
$total=count($words);
$group_1=array(array());
$group_2=array(array());
//We can create TOTAL-1 combinations
for($i=0;$i<$total;$i++){
$lim=$total-$i-1;
    for($j=0;$j<$total;$j++){
        if($j<$lim){
            $group_1[$i][]=$words[$j];
        }else{
            $group_2[$i][]=$words[$j];
        }
    }
}
return array($group_1,$group_2);
}

已接受答案的评论中引用的更新

$model="Damn you {1}, {2} will kill you. {3}{4}{5}";
//Array => Save how many single placeholders are in each "placeholder block"
$placeholder_count{
[0]=1, //first block contains one placeholder
[1]=1, //second block contains one placeholder
[2]=3  //third block contains three placeholders
}
//Simplify all blocks into ONE SINGLE regex placeholder
$pattern="/Damn you (.*), (.*) will kill you. (.*)/";

//Match in string
$string="Damn you Spar, Will will kill you. I Love it man.";
preg_match($pattern,$string,$matches);

//View in array which placeholders have to be checked
$block_0=$matches[1]; //Array shows it was 1 p.holder. No check needed
$block_1=$matches[2]; //Array shows it was 1 p.holder. No check needed
$block_2=$matches[3]; //It shows it were 3 p.holders. Possible grouping (n=3)

//Result
$v1=$matches[1];
$v2=$matches[2];

$v3,$v4,$v5=(Result of grouping and querying the $matches[3] with groups_count=3)

score 1 · Accepted Answer

何时Christopher Johnson McCandless映射到{1}{2}：

形成两组的可能组合是：

Christopher Johnson 和 McCandless
Christopher和Johnson McCandless

何时cinema tomorrow at night映射到{3}{4}

形成两组的可能组合是：

cinema 和 tomorrow at night
cinema tomorrow和at night
cinema tomorrow at和night

编写一个 PHP 函数以 get_possible_groups($string_of_words, $group_count)返回组组合数组的数组。

和如下 SQL 语句：

SELECT count(*), 'cinema' firstWordGroup, 'tomorrow at night' secondWordGroup
  FROM possibleMatchTable
 WHERE possible_match IN ('cinema', 'tomorrow at night')
UNION
SELECT count(*), 'cinema tomorrow', 'at night'
  FROM possibleMatchTable
 WHERE possible_match IN ('cinema tomorrow', 'at night')
UNION
SELECT count(*), 'cinema tomorrow at', 'night'
  FROM possibleMatchTable
 WHERE possible_match IN ('cinema tomorrow at', 'night');

一种可能的输出可以是：

+----------+--------------------+-------------------+
| count(*) | firstWordGroup     | secondWordGroup   |
+----------+--------------------+-------------------+
|        2 | cinema             | tomorrow at night |
|        0 | cinema tomorrow    | at night          |
|        0 | cinema tomorrow at | night             |
+----------+--------------------+-------------------+

以计数 2（两个词组）为准，这就是您的答案。

如果MODELtext 是fulltext索引列，那么对于任何给定的随机字符串，您可以获得最相关的模型，例如：

SELECT * FROM model_strings 
WHERE MATCH(model) AGAINST ('Damn you Spar, Kot will kill you.');

查询可能会返回类似：

+----------------------------------+
| model                            |
+----------------------------------+
| Damn you {1}, {2} will kill you. |
+----------------------------------+

使用来自的占位符提取随机字符串的单词Model：

<?php 

$placeholder_pRegEx = '#\{\d+\}#';

$model = 'Damn you {1}, {2} will kill you. {3}{4}{5}';
$string = 'Damn you Spar, Will will kill you. I Love it man.';

$model_words = explode(' ', $model);
$string_words = explode(' ', $string);

$placeholder_words = array();

for ($idx =0, $jdx=0; $idx < count($string_words); $idx ++) {

    if ($jdx < count($model_words)) {
        if (strcmp($string_words[$idx], $model_words[$jdx])) {
            $placeholder_words[] = $string_words[$idx];

            //Move to next word in Model only if it's a placeholder
            if (preg_match($placeholder_pRegEx, $model_words[$jdx]))
                $jdx++;

        } else
            $jdx++; //they match so move to next word
    } else
        $placeholder_words[] = $string_words[$idx];
}

//Even status will have the count
$status = preg_match_all ($placeholder_pRegEx, $model, $placeholders);

$group_count = count($placeholders[0]);

var_dump(get_defined_vars());
?>

上面的代码将为您提供如下值：

'placeholder_words' => array (size=6)
  0 => string 'Spar,' (length=5)
  1 => string 'Will' (length=4)
  2 => string 'I' (length=1)
  3 => string 'Love' (length=4)
  4 => string 'it' (length=2)
  5 => string 'man.' (length=4)

'placeholders' => array (size=1)
  0 => 
    array (size=5)
      0 => string '{1}' (length=3)
      1 => string '{2}' (length=3)
      2 => string '{3}' (length=3)
      3 => string '{4}' (length=3)
      4 => string '{5}' (length=3)

'group_count' => int 5

从那里你可以打电话get possible groupings
然后 SQL 查询检查允许的可能匹配
所需分组中的实际单词。

唉，这是个问题，嗯！

php - PHP 字符串差异和动态限制

1 回答 1

Related

Reference