2

假设我有随机的文本块:

EAMoAAQAABwEBAAAAAAAAAAAAAAABAgMFBgcIBAkBAQABBQEBAAAAAAAAAAAAAAAGAgMEBQcBCBAAAQMDAgMEBQcIBQgGCwEAAQACAxEEBSEGMRIHQVFhE3GBIhQIkaGxwTJCI9FScoKSojMV8GLCUxbhstKDo7M0ZHOTJEQlF/HiQ2PDVHSExEUmGBEBAAIBAgMDCAgCCgMBAQEAAAECAxEEITEFQRIGUWFxgZGhIhPwscHRMlIUB0Jy4fGCkqLCI1MVFrLSQ2IzF//aAAwDAQACEQMRAD8A7+QEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEEDwXkzpxHgusxi7NrnXF3G0NBLhzAkAeAqVH934r6bt57uTPSJ8ne1n2Rqycezy35VlRttwYu5DXNlLOcczOdpHM3hUUqtLs/wBxulZonXJ8vjp8caa+eOa5k6flrPLVcIbm3n/gytf4NcCVKtj1XbbqNcOSuT+W0W+pi3x2rzjRWWxUCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAggV5It2Uy8GNYAWmW6kr5MDftO8T3BRXxR4s2/SccTb48lvw0jnPnn8tfP6o1Ze02ds08OERzlid+/P5Orp5BHEeFuxxa0Dxpx9a+fOu+Iup9Tmfm30p+Ss92vr/N6bat/t67fDyjWfLLG79pt45YpAA8NdUAg9ngolTFNbedtqWi0avVicv5bLKFr2kSRltHaahrXCnylZcd6k208rDy4ItxlkUr5+XnZE1zxq0h3KfUQqv1GWsxeI0tHKY1rPtjRgVivKZU7HebrS491ybX+TWnO7V7PEn7w+f0rpPhb9zdxt7Rj3szkx/n/AI6+n88f4vTyebno8Wr3qTGvun7mawSxzsbNC4Pje0Oa9pqCD2grv+3z0zUi9Ji1bRrEx2wjtqzWdJ5wqq8pEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQU

规格:

patternABC >= 2 characters = groupABC IF groupABC occurs more than once
groupABC + (groupABC)n = sequence WHERE n >= 1 AND sequence > 6 characters

** 一个序列需要大于 6 个字符才能被评估

分解:

如何找到按顺序出现的任何重复模式?

QEBAQEBAQEBAQEBAQEBAQEBA

我还想计算每组重复多少次:

QEBA QEBA QEBA QEBA QEBA QEBA = 6

此外,该序列必须大于 6 个字符才能进行评估:

NO GOOD: AA AA AA
GOOD: AA AA AA AA

如果输出可以存储在关联数组中,并且删除重复条目,那将是理想的:

QEBA => 6, AA => 4, QEBA => 3, AA => 8, (QEBA => 6)<- REMOVE

有没有人有时间和意愿来解决这个问题?如果你这样做,你会摇滚!

4

2 回答 2

3
$str = 'EAMoAAQAABwEBAAAAAAAAAAAAAAABAgMFBgcIBAkBAQABBQEBAAAAAAAAAAAAAAAGAgMEBQcBCBAAAQMDAgMEBQcIBQgGCwEAAQACAxEEBSEGMRIHQVFhE3GBIhQIkaGxwTJCI9FScoKSojMV8GLCUxbhstKDo7M0ZHOTJEQlF/HiQ2PDVHSExEUmGBEBAAIBAgMDCAgCCgMBAQEAAAECAxEEITEFQRIGUWFxgZGhIhPwscHRMlIUB0Jy4fGCkqLCI1MVFrLSQ2IzF//aAAwDAQACEQMRAD8A7+QEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEEDwXkzpxHgusxi7NrnXF3G0NBLhzAkAeAqVH934r6bt57uTPSJ8ne1n2Rqycezy35VlRttwYu5DXNlLOcczOdpHM3hUUqtLs/wBxulZonXJ8vjp8caa+eOa5k6flrPLVcIbm3n/gytf4NcCVKtj1XbbqNcOSuT+W0W+pi3x2rzjRWWxUCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAggV5It2Uy8GNYAWmW6kr5MDftO8T3BRXxR4s2/SccTb48lvw0jnPnn8tfP6o1Ze02ds08OERzlid+/P5Orp5BHEeFuxxa0Dxpx9a+fOu+Iup9Tmfm30p+Ss92vr/N6bat/t67fDyjWfLLG79pt45YpAA8NdUAg9ngolTFNbedtqWi0avVicv5bLKFr2kSRltHaahrXCnylZcd6k208rDy4ItxlkUr5+XnZE1zxq0h3KfUQqv1GWsxeI0tHKY1rPtjRgVivKZU7HebrS491ybX+TWnO7V7PEn7w+f0rpPhb9zdxt7Rj3szkx/n/AI6+n88f4vTyebno8Wr3qTGvun7mawSxzsbNC4Pje0Oa9pqCD2grv+3z0zUi9Ji1bRrEx2wjtqzWdJ5wqq8pEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQU';

preg_match_all( '/(\S{2,}?)\1+/', $str, $matches );

// Remove duplicates
$matches[0] = array_unique( $matches[0] ); 

foreach ( $matches[0] as $key => $value ) {
    if ( strlen( $value ) > 6 ) {
        $repeated = $matches[1][$key];
        $results[] = array( $repeated => count( explode( $repeated, $value ) ) - 1 );
    }    
}

print_r($results); 

/*
[AA] => 7
[QEBA] => 93
[CAgI] => 18
[EBAQ] => 18
*/

以上假设一个序列由非空格字符组成。

于 2013-03-10T20:01:20.850 回答
1

获取序列preg_match_all('/(?:(.{6,})\1)/',$inputText,$sequences)
(注意:序列将保存在$sequences
解释的正则表达式演示:http ://regex101.com/r/rW4nE2

使用array_unique()删除重复项。

循环遍历每个序列并:通过 解释的 RegEx 演示
获取组: http ://regex101.com/r/pC3pB7preg_match_all('/(.+?)(\1)(\1)?/',$sequence,$groups)

如果需要,请使用count() 。

于 2013-03-10T19:54:40.277 回答