c - C中不区分大小写的字符串比较

Question

char*我有两个要比较的邮政编码，忽略大小写。有这样做的功能吗？

还是我必须遍历每个使用tolower函数然后进行比较？

知道这个函数将如何对字符串中的数字做出反应

谢谢

score 67 · Accepted Answer

C 标准中没有执行此操作的函数。符合 POSIX 的 Unix 系统必须strcasecmp在 header 中包含strings.h；微软系统有stricmp. 要在便携式方面，请编写自己的：

int strcicmp(char const *a, char const *b)
{
    for (;; a++, b++) {
        int d = tolower((unsigned char)*a) - tolower((unsigned char)*b);
        if (d != 0 || !*a)
            return d;
    }
}

但请注意，这些解决方案都不适用于 UTF-8 字符串，只有 ASCII 字符串。

score 45 · Accepted Answer

45

看看strcasecmp()在strings.h.

于 2011-04-28T15:11:16.040 回答

score 9 · Accepted Answer

我发现内置的这种方法命名为 from 其中包含标准 header 的附加字符串函数。

这是相关的签名：

int  strcasecmp(const char *, const char *);
int  strncasecmp(const char *, const char *, size_t);

我还发现它在 xnu 内核 (osfmk/device/subrs.c) 中是同义词，并且在以下代码中实现，因此与原始 strcmp 函数相比，您不会期望在数量上有任何行为变化。

tolower(unsigned char ch) {
    if (ch >= 'A' && ch <= 'Z')
        ch = 'a' + (ch - 'A');
    return ch;
 }

int strcasecmp(const char *s1, const char *s2) {
    const unsigned char *us1 = (const u_char *)s1,
                        *us2 = (const u_char *)s2;

    while (tolower(*us1) == tolower(*us2++))
        if (*us1++ == '\0')
            return (0);
    return (tolower(*us1) - tolower(*--us2));
}

score 7 · Accepted Answer

在进行不区分大小写的比较时要注意的其他陷阱：

比较小写还是大写？（很常见的问题）

下面两者都将返回 0strcicmpL("A", "a")和strcicmpU("A", "a")。
然而strcicmpL("A", "_")，并且strcicmpU("A", "_")可以返回不同的签名结果，'_'通常在大写和小写字母之间。

这会影响与一起使用时的排序顺序qsort(..., ..., ..., strcicmp)。非标准库 C 函数，如常用的 stricmp()或strcasecmp()往往定义良好并喜欢通过小写进行比较。然而变化是存在的。

int strcicmpL(char const *a, char const *b) {
  while (*b) {
    int d = tolower(*a) - tolower(*b);
    if (d) {
        return d;
    } 
    a++;
    b++;
  } 
  return tolower(*a);
}

int strcicmpU(char const *a, char const *b) {
  while (*b) {
    int d = toupper(*a) - toupper(*b);
    if (d) {
        return d;
    } 
    a++;
    b++;
  } 
  return toupper(*a);
}

char可以有一个负值。（并不罕见）

touppper(int)并tolower(int)为unsigned char值和负数指定EOF。此外，strcmp()返回结果，就好像每个都char被转换为一样unsigned char，无论char是有符号还是无符号。

tolower(*a); // Potential UB
tolower((unsigned char) *a); // Correct (Almost - see following)

char可以有负值而不是 2 的补码。（稀有的）

由于-0位模式应被解释为unsigned char. 要正确处理所有整数编码，请先更改指针类型。

// tolower((unsigned char) *a);
tolower(*(const unsigned char *)a); // Correct

语言环境（不太常见）

尽管使用 ASCII 代码 (0-127) 的字符集无处不在，但其余代码往往存在特定于语言环境的问题。因此strcasecmp("\xE4", "a")可能会在一个系统上返回 0，而在另一个系统上返回非零。

Unicode（未来的方式）

如果解决方案需要处理的不仅仅是 ASCII，请考虑使用unicode_strcicmp(). 由于 C 库不提供这样的功能，因此建议使用某个替代库中的预编码函数。自己写 unicode_strcicmp()是一项艰巨的任务。

所有字母都映射一个低到一个高吗？（迂腐）

[AZ] 与 [az] 一对一映射，但各种语言环境将各种小写字符映射到一个大写字符，反之亦然。此外，一些大写字符可能缺少等效的小写字符，反之亦然。

tolower()这要求代码通过和进行隐蔽tolower()。

int d = tolower(toupper(*a)) - tolower(toupper(*b));

同样，如果代码确实tolower(toupper(*a))与toupper(tolower(*a)).

可移植性

@B。Nadolson建议避免自己滚动strcicmp()，这是合理的，除非代码需要高度等效的可移植功能。

下面是一种甚至比某些系统提供的功能执行得更快的方法。它通过使用与'\0'. 您的结果可能会有所不同。

static unsigned char low1[UCHAR_MAX + 1] = {
  0, 1, 2, 3, ...
  '@', 'a', 'b', 'c', ... 'z', `[`, ...  // @ABC... Z[...
  '`', 'a', 'b', 'c', ... 'z', `{`, ...  // `abc... z{...
}
static unsigned char low2[UCHAR_MAX + 1] = {
// v--- Not zero, but A which matches none in `low1[]`
  'A', 1, 2, 3, ...
  '@', 'a', 'b', 'c', ... 'z', `[`, ...
  '`', 'a', 'b', 'c', ... 'z', `{`, ...
}

int strcicmp_ch(char const *a, char const *b) {
  // compare using tables that differ slightly.
  while (low1[*(const unsigned char *)a] == low2[*(const unsigned char *)b]) {
    a++;
    b++;
  }
  // Either strings differ or null character detected.
  // Perform subtraction using same table.
  return (low1[*(const unsigned char *)a] - low1[*(const unsigned char *)b]);
}

score 6 · Accepted Answer

我会用stricmp(). 它比较两个字符串而不考虑大小写。

请注意，在某些情况下，将字符串转换为小写会更快。

score 4 · Accepted Answer

正如其他人所说，没有适用于所有系统的便携式功能。您可以使用简单的方法部分规避此问题ifdef：

#include <stdio.h>

#ifdef _WIN32
#include <string.h>
#define strcasecmp _stricmp
#else // assuming POSIX or BSD compliant system
#include <strings.h>
#endif

int main() {
    printf("%d", strcasecmp("teSt", "TEst"));
}

score 4 · Accepted Answer

我并不是这里最受支持的答案的粉丝（部分原因是它似乎不正确，因为它应该continue在任一字符串中读取空终止符 - 但不是同时读取两个字符串 - 而且它不这样做），所以我自己写了。

这是的直接替代品`strncmp()`，并且已经过大量测试用例的测试，如下所示。

它与以下内容相同strncmp()：

它不区分大小写。
如果任一字符串为空 ptr，则行为不是未定义的（它是明确定义的）。strncmp()如果任一字符串为空 ptr，则常规具有未定义的行为（请参阅： https ://en.cppreference.com/w/cpp/string/byte/strncmp ）。
如果任一输入字符串是ptr ，它将INT_MIN作为特殊标记错误值返回。NULL

限制：请注意，此代码仅适用于原始 7 位 ASCII 字符集（十进制值 0 到 127，包括在内），不适用于unicode字符，例如 unicode 字符编码UTF-8（最流行的）、UTF-16、和UTF-32。

这里只是代码（没有评论）：

int strncmpci(const char * str1, const char * str2, size_t num)
{
    int ret_code = 0;
    size_t chars_compared = 0;

    if (!str1 || !str2)
    {
        ret_code = INT_MIN;
        return ret_code;
    }

    while ((chars_compared < num) && (*str1 || *str2))
    {
        ret_code = tolower((int)(*str1)) - tolower((int)(*str2));
        if (ret_code != 0)
        {
            break;
        }
        chars_compared++;
        str1++;
        str2++;
    }

    return ret_code;
}

完全注释版本：

/// \brief      Perform a case-insensitive string compare (`strncmp()` case-insensitive) to see
///             if two C-strings are equal.
/// \note       1. Identical to `strncmp()` except:
///               1. It is case-insensitive.
///               2. The behavior is NOT undefined (it is well-defined) if either string is a null
///               ptr. Regular `strncmp()` has undefined behavior if either string is a null ptr
///               (see: https://en.cppreference.com/w/cpp/string/byte/strncmp).
///               3. It returns `INT_MIN` as a special sentinel value for certain errors.
///             - Posted as an answer here: https://stackoverflow.com/a/55293507/4561887.
///               - Aided/inspired, in part, by `strcicmp()` here:
///                 https://stackoverflow.com/a/5820991/4561887.
/// \param[in]  str1        C string 1 to be compared.
/// \param[in]  str2        C string 2 to be compared.
/// \param[in]  num         max number of chars to compare
/// \return     A comparison code (identical to `strncmp()`, except with the addition
///             of `INT_MIN` as a special sentinel value):
///
///             INT_MIN (usually -2147483648 for int32_t integers)  Invalid arguments (one or both
///                      of the input strings is a NULL pointer).
///             <0       The first character that does not match has a lower value in str1 than
///                      in str2.
///              0       The contents of both strings are equal.
///             >0       The first character that does not match has a greater value in str1 than
///                      in str2.
int strncmpci(const char * str1, const char * str2, size_t num)
{
    int ret_code = 0;
    size_t chars_compared = 0;

    // Check for NULL pointers
    if (!str1 || !str2)
    {
        ret_code = INT_MIN;
        return ret_code;
    }

    // Continue doing case-insensitive comparisons, one-character-at-a-time, of `str1` to `str2`, so
    // long as 1st: we have not yet compared the requested number of chars, and 2nd: the next char
    // of at least *one* of the strings is not zero (the null terminator for a C-string), meaning
    // that string still has more characters in it.
    // Note: you MUST check `(chars_compared < num)` FIRST or else dereferencing (reading) `str1` or
    // `str2` via `*str1` and `*str2`, respectively, is undefined behavior if you are reading one or
    // both of these C-strings outside of their array bounds.
    while ((chars_compared < num) && (*str1 || *str2))
    {
        ret_code = tolower((int)(*str1)) - tolower((int)(*str2));
        if (ret_code != 0)
        {
            // The 2 chars just compared don't match
            break;
        }
        chars_compared++;
        str1++;
        str2++;
    }

    return ret_code;
}

测试代码：

从我的eRCaGuy_hello_world存储库下载完整的示例代码和单元测试：“ strncmpci.c”：

（这只是一个片段）

int main()
{
    printf("-----------------------\n"
           "String Comparison Tests\n"
           "-----------------------\n\n");

    int num_failures_expected = 0;

    printf("INTENTIONAL UNIT TEST FAILURE to show what a unit test failure looks like!\n");
    EXPECT_EQUALS(strncmpci("hey", "HEY", 3), 'h' - 'H');
    num_failures_expected++;
    printf("------ beginning ------\n\n");


    const char * str1;
    const char * str2;
    size_t n;

    // NULL ptr checks
    EXPECT_EQUALS(strncmpci(NULL, "", 0), INT_MIN);
    EXPECT_EQUALS(strncmpci("", NULL, 0), INT_MIN);
    EXPECT_EQUALS(strncmpci(NULL, NULL, 0), INT_MIN);
    EXPECT_EQUALS(strncmpci(NULL, "", 10), INT_MIN);
    EXPECT_EQUALS(strncmpci("", NULL, 10), INT_MIN);
    EXPECT_EQUALS(strncmpci(NULL, NULL, 10), INT_MIN);

    EXPECT_EQUALS(strncmpci("", "", 0), 0);
    EXPECT_EQUALS(strncmp("", "", 0), 0);

    str1 = "";
    str2 = "";
    n = 0;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 0);

    str1 = "hey";
    str2 = "HEY";
    n = 0;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 0);

    str1 = "hey";
    str2 = "HEY";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 'h' - 'H');

    str1 = "heY";
    str2 = "HeY";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 'h' - 'H');

    str1 = "hey";
    str2 = "HEdY";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 'y' - 'd');
    EXPECT_EQUALS(strncmp(str1, str2, n), 'h' - 'H');

    str1 = "heY";
    str2 = "hEYd";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 'e' - 'E');

    str1 = "heY";
    str2 = "heyd";
    n = 6;
    EXPECT_EQUALS(strncmpci(str1, str2, n), -'d');
    EXPECT_EQUALS(strncmp(str1, str2, n), 'Y' - 'y');

    str1 = "hey";
    str2 = "hey";
    n = 6;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 0);

    str1 = "hey";
    str2 = "heyd";
    n = 6;
    EXPECT_EQUALS(strncmpci(str1, str2, n), -'d');
    EXPECT_EQUALS(strncmp(str1, str2, n), -'d');

    str1 = "hey";
    str2 = "heyd";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 0);

    str1 = "hEY";
    str2 = "heyYOU";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 'E' - 'e');

    str1 = "hEY";
    str2 = "heyYOU";
    n = 10;
    EXPECT_EQUALS(strncmpci(str1, str2, n), -'y');
    EXPECT_EQUALS(strncmp(str1, str2, n), 'E' - 'e');

    str1 = "hEYHowAre";
    str2 = "heyYOU";
    n = 10;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 'h' - 'y');
    EXPECT_EQUALS(strncmp(str1, str2, n), 'E' - 'e');

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE TO MEET YOU.,;", 100), 0);
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "NICE TO MEET YOU.,;", 100), 'n' - 'N');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice to meet you.,;", 100), 0);

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE TO UEET YOU.,;", 100), 'm' - 'u');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice to uEET YOU.,;", 100), 'm' - 'u');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice to UEET YOU.,;", 100), 'm' - 'U');

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE TO MEET YOU.,;", 5), 0);
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "NICE TO MEET YOU.,;", 5), 'n' - 'N');

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE eo UEET YOU.,;", 5), 0);
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice eo uEET YOU.,;", 5), 0);

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE eo UEET YOU.,;", 100), 't' - 'e');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice eo uEET YOU.,;", 100), 't' - 'e');

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "nice-eo UEET YOU.,;", 5), ' ' - '-');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice-eo UEET YOU.,;", 5), ' ' - '-');


    if (globals.error_count == num_failures_expected)
    {
        printf(ANSI_COLOR_GRN "All unit tests passed!" ANSI_COLOR_OFF "\n");
    }
    else
    {
        printf(ANSI_COLOR_RED "FAILED UNIT TESTS! NUMBER OF UNEXPECTED FAILURES = %i"
            ANSI_COLOR_OFF "\n", globals.error_count - num_failures_expected);
    }

    assert(globals.error_count == num_failures_expected);
    return globals.error_count;
}

样本输出：

$ gcc -Wall -Wextra -Werror -ggdb -std=c11 -o ./bin/tmp strncmpci.c && ./bin/tmp
-----------------------
String Comparison Tests
-----------------------

INTENTIONAL UNIT TEST FAILURE to show what a unit test failure looks like!
FAILED at line 250 in function main! strncmpci("hey", "HEY", 3) != 'h' - 'H'
  a: strncmpci("hey", "HEY", 3) is 0
  b: 'h' - 'H' is 32

------ beginning ------

All unit tests passed!

参考：

这里的这个问题和其他答案提供了灵感并提供了一些见解（C 中不区分大小写的字符串组合）
http://www.cplusplus.com/reference/cstring/strncmp/
https://en.wikipedia.org/wiki/ASCII
https://en.cppreference.com/w/c/language/operator_precedence
我为修复上面的部分代码所做的 未定义行为研究（请参阅下面的评论）：
1. 谷歌搜索“c undefined behavior reading outside array bounds”
2. 是否在其绑定的未定义行为之外访问全局数组？
3. https://en.cppreference.com/w/cpp/language/ub - 另见底部许多非常棒的“外部链接”！
4. 1/3：http ://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html
5. 2/3：https ://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html
6. 3/3：https ://blog.llvm.org/2011/05/what-every-c-programmer-should-know_21.html
7. https://blog.regehr.org/archives/213
8. https://www.geeksforgeeks.org/accessing-array-bounds-ccpp/

进一步研究的主题

（注意：这是 C++，不是 C）Unicode 字符的小写
OnlineGDB上的 tolower_tests.c：https ://onlinegdb.com/HyZieXcew

去做：

制作此代码的版本，该版本也适用于 Unicode 的UTF-8实现（字符编码）！

score 1 · Accepted Answer

简单的解决方案：

int str_case_ins_cmp(const char* a, const char* b) {
  int rc;

  while (1) {
    rc = tolower((unsigned char)*a) - tolower((unsigned char)*b);
    if (rc || !*a) {
      break;
    }

    ++a;
    ++b;
  }

  return rc;
}

score 1 · Accepted Answer

你可以从这里得到一个想法，如何实现一个高效的，如果你在图书馆里没有任何想法

它对所有 256 个字符使用一个表。

在该表中，除字母外的所有字符都使用了它的 ascii 代码。
对于大写字母代码 - 小写符号的表格列表代码。

然后我们只需要遍历一个字符串并比较给定字符的表格单元格：

const char *cm = charmap,
        *us1 = (const char *)s1,
        *us2 = (const char *)s2;
while (cm[*us1] == cm[*us2++])
    if (*us1++ == '\0')
        return (0);
return (cm[*us1] - cm[*--us2]);

score 0 · Accepted Answer

static int ignoreCaseComp (const char *str1, const char *str2, int length)
{
    int k;
    for (k = 0; k < length; k++)
    {

        if ((str1[k] | 32) != (str2[k] | 32))
            break;
    }

    if (k != length)
        return 1;
    return 0;
}

参考

score 0 · Accepted Answer

如果我们有一个以空字符结尾的字符：

   bool striseq(const char* s1,const char* s2){ 
     for(;*s1;){ 
       if(tolower(*s1++)!=tolower(*s2++)) 
         return false; 
      } 
      return *s1 == *s2;
    }

或使用按位运算的此版本：

    int striseq(const char* s1,const char* s2)
       {for(;*s1;) if((*s1++|32)!=(*s2++|32)) return 0; return *s1 == *s2;}

我不确定这是否适用于符号，我没有在那里测试过，但适用于字母。

score -1 · Accepted Answer

int strcmpInsensitive(char* a, char* b)
{
    return strcmp(lowerCaseWord(a), lowerCaseWord(b));
}

char* lowerCaseWord(char* a)
{
    char *b=new char[strlen(a)];
    for (int i = 0; i < strlen(a); i++)
    {
        b[i] = tolower(a[i]);   
    }
    return b;
}

祝你好运

Edit-lowerCaseWord 函数获取一个 char* 变量，并返回这个 char* 的小写值。例如 char* 值的“AbCdE”将返回“abcde”。

基本上，它所做的是将两个 char* 变量转换为小写后，并在它们上使用 strcmp 函数。

例如 - 如果我们为“AbCdE”和“ABCDE”的值调用 strcmpInsensitive 函数，它将首先以小写（“abcde”）返回两个值，然后对它们执行 strcmp 函数。

c - C中不区分大小写的字符串比较

12 回答 12

在进行不区分大小写的比较时要注意的其他陷阱：

这是 的直接替代品strncmp()，并且已经过大量测试用例的测试，如下所示。

测试代码：

样本输出：

参考：

进一步研究的主题

去做：

Related

Reference

这是的直接替代品`strncmp()`，并且已经过大量测试用例的测试，如下所示。