我正在尝试创建一个函数,给定两个 C 字符串,它会返回两个字符串之间连续字符重叠的数量。
例如,
String 1: "Today is monday."
String 2: " is monday."
这里的重叠将是“is monday.”,即 11 个字符(包括空格和 '.')。
如果您需要更高效的东西,请考虑字符串 1 和 2 之间的部分不匹配意味着您可以沿字符串 1 跳跃字符串 2 的其余部分的长度。这意味着您不需要搜索整个字符串 1。
看看Boyer-Moore 算法。尽管它用于字符串搜索,但您可以使用字符串 2 作为模式和字符串 1 作为目标文本来实现此算法以查找最大长度的子字符串。
可能有一种更有效的方法来做到这一点,但这里有一个简单的方法:
#include <string.h>
int main() {
char s1[17] = "Today is monday.";
char s2[12] = " is monday.";
int max = 0;
int i_max = -1;
int j_max = -1;
int i = 0, j = 0, k=0;
int endl = 0, sl1, sl2;
char *ss1, *ss2;
for(i = 0; i < strlen(s1)-1; i++) {
ss1 = s1+i;
sl1 = strlen(ss1);
if(max >= sl1) {
break; // You found it.
}
for(j = 0; j < strlen(s2)-1; j++) {
ss2 = s2+j;
sl2 = strlen(ss2);
if(max >= sl2) {
break; // Can't find a bigger overlap.
}
endl = (sl1 > sl2)?sl2:sl1;
int n_char = 0;
for(k = 0; k < endl+1; k++) {
// printf("%s\t%s\n", ss1+k, ss2+k); // Uncomment if you want to see what it compares.
if(ss1[k] != ss2[k] || ss1[k] == '\0') {
n_char = k;
break;
}
}
if(n_char > max) {
max = n_char;
i_max = i;
j_max = j;
}
}
}
char nstr[max+1];
nstr[max] = '\0';
strncpy(nstr, s1+i_max, max);
printf("Maximum overlap is %d characters, substring: %s\n", max, nstr);
return 0;
}
更新:我已经修复了错误。这绝对可以编译。结果如下:http://codepad.org/SINhmm7f 问题是 endl 定义错误,我没有检查行尾条件。
希望代码不言自明。
这是我的解决方案,它将返回重叠起点的位置,这有点复杂,但这就是它在 C 中的完成方式:
#include <string.h>
int FindOverlap (const char * a, const char * b)
{
// iterators
char * u = a;
char * v = b;
char * c = 0; // overlap iterator
char overlapee = 'b';
if (strlen(a) < strlen(b)) overlapee = 'a';
if (overlapee == 'b')
{
while (*u != '\0')
{
v = b; // reset b iterator
c = u;
while (*v != '\0')
{
if (*c != *v) break;
c++;
v++;
}
if (*v == '\0') return (u-a); // return overlap starting point
}
}
else if (overlapee == 'a')
{
while (*v != '\0')
{
u = a; // reset b iterator
c = v;
while (*u != '\0')
{
if (*c != *u) break;
c++;
u++;
}
if (*v == '\0') return (v-b); // return overlap starting point
}
}
return (-1); // not found
}