88

char*我有两个要比较的邮政编码,忽略大小写。有这样做的功能吗?

还是我必须遍历每个使用tolower函数然后进行比较?

知道这个函数将如何对字符串中的数字做出反应

谢谢

4

12 回答 12

67

C 标准中没有执行此操作的函数。符合 POSIX 的 Unix 系统必须strcasecmp在 header 中包含strings.h;微软系统有stricmp. 要在便携式方面,请编写自己的:

int strcicmp(char const *a, char const *b)
{
    for (;; a++, b++) {
        int d = tolower((unsigned char)*a) - tolower((unsigned char)*b);
        if (d != 0 || !*a)
            return d;
    }
}

但请注意,这些解决方案都不适用于 UTF-8 字符串,只有 ASCII 字符串。

于 2011-04-28T15:21:22.903 回答
45

看看strcasecmp()strings.h.

于 2011-04-28T15:11:16.040 回答
9

我发现内置的这种方法命名为 from 其中包含标准 header 的附加字符串函数。

这是相关的签名:

int  strcasecmp(const char *, const char *);
int  strncasecmp(const char *, const char *, size_t);

我还发现它在 xnu 内核 (osfmk/device/subrs.c) 中是同义词,并且在以下代码中实现,因此与原始 strcmp 函数相比,您不会期望在数量上有任何行为变化。

tolower(unsigned char ch) {
    if (ch >= 'A' && ch <= 'Z')
        ch = 'a' + (ch - 'A');
    return ch;
 }

int strcasecmp(const char *s1, const char *s2) {
    const unsigned char *us1 = (const u_char *)s1,
                        *us2 = (const u_char *)s2;

    while (tolower(*us1) == tolower(*us2++))
        if (*us1++ == '\0')
            return (0);
    return (tolower(*us1) - tolower(*--us2));
}
于 2016-01-04T11:16:54.103 回答
7

在进行不区分大小写的比较时要注意的其他陷阱:


比较小写还是大写?(很常见的问题)

下面两者都将返回 0strcicmpL("A", "a")strcicmpU("A", "a")
然而strcicmpL("A", "_"),并且strcicmpU("A", "_")可以返回不同的签名结果,'_'通常在大写和小写字母之间。

这会影响与 一起使用时的排序顺序qsort(..., ..., ..., strcicmp)。非标准库 C 函数,如常用的 stricmp()strcasecmp()往往定义良好并喜欢通过小写进行比较。然而变化是存在的。

int strcicmpL(char const *a, char const *b) {
  while (*b) {
    int d = tolower(*a) - tolower(*b);
    if (d) {
        return d;
    } 
    a++;
    b++;
  } 
  return tolower(*a);
}

int strcicmpU(char const *a, char const *b) {
  while (*b) {
    int d = toupper(*a) - toupper(*b);
    if (d) {
        return d;
    } 
    a++;
    b++;
  } 
  return toupper(*a);
}

char可以有一个负值。(并不罕见)

touppper(int)tolower(int)unsigned char值和负数指定EOF。此外,strcmp()返回结果,就好像每个都char被转换为一样unsigned char,无论char是有符号还是无符号

tolower(*a); // Potential UB
tolower((unsigned char) *a); // Correct (Almost - see following)

char可以有负值而不是 2 的补码。(稀有的)

由于-0位模式应被解释为unsigned char. 要正确处理所有整数编码,请先更改指针类型。

// tolower((unsigned char) *a);
tolower(*(const unsigned char *)a); // Correct

语言环境(不太常见)

尽管使用 ASCII 代码 (0-127) 的字符集无处不在,但其余代码往往存在特定于语言环境的问题。因此strcasecmp("\xE4", "a")可能会在一个系统上返回 0,而在另一个系统上返回非零。


Unicode(未来的方式)

如果解决方案需要处理的不仅仅是 ASCII,请考虑使用unicode_strcicmp(). 由于 C 库不提供这样的功能,因此建议使用某个替代库中的预编码函数。自己写 unicode_strcicmp()是一项艰巨的任务。


所有字母都映射一个低到一个高吗?(迂腐)

[AZ] 与 [az] 一对一映射,但各种语言环境将各种小写字符映射到一个大写字符,反之亦然。此外,一些大写字符可能缺少等效的小写字符,反之亦然。

tolower()这要求代码通过和进行隐蔽tolower()

int d = tolower(toupper(*a)) - tolower(toupper(*b));

同样,如果代码确实tolower(toupper(*a))toupper(tolower(*a)).


可移植性

@B。Nadolson建议避免自己滚动strcicmp(),这是合理的,除非代码需要高度等效的可移植功能。

下面是一种甚至比某些系统提供的功能执行得更快的方法。它通过使用与'\0'. 您的结果可能会有所不同。

static unsigned char low1[UCHAR_MAX + 1] = {
  0, 1, 2, 3, ...
  '@', 'a', 'b', 'c', ... 'z', `[`, ...  // @ABC... Z[...
  '`', 'a', 'b', 'c', ... 'z', `{`, ...  // `abc... z{...
}
static unsigned char low2[UCHAR_MAX + 1] = {
// v--- Not zero, but A which matches none in `low1[]`
  'A', 1, 2, 3, ...
  '@', 'a', 'b', 'c', ... 'z', `[`, ...
  '`', 'a', 'b', 'c', ... 'z', `{`, ...
}

int strcicmp_ch(char const *a, char const *b) {
  // compare using tables that differ slightly.
  while (low1[*(const unsigned char *)a] == low2[*(const unsigned char *)b]) {
    a++;
    b++;
  }
  // Either strings differ or null character detected.
  // Perform subtraction using same table.
  return (low1[*(const unsigned char *)a] - low1[*(const unsigned char *)b]);
}
于 2018-08-23T18:28:06.140 回答
6

我会用stricmp(). 它比较两个字符串而不考虑大小写。

请注意,在某些情况下,将字符串转换为小写会更快。

于 2011-04-28T15:17:43.810 回答
4

正如其他人所说,没有适用于所有系统的便携式功能。您可以使用简单的方法部分规避此问题ifdef

#include <stdio.h>

#ifdef _WIN32
#include <string.h>
#define strcasecmp _stricmp
#else // assuming POSIX or BSD compliant system
#include <strings.h>
#endif

int main() {
    printf("%d", strcasecmp("teSt", "TEst"));
}
于 2019-05-23T15:37:36.720 回答
4

我并不是这里最受支持的答案的粉丝(部分原因是它似乎不正确,因为它应该continue在任一字符串中读取空终止符 - 但不是同时读取两个字符串 - 而且它不这样做),所以我自己写了。

这是 的直接替代品strncmp(),并且已经过大量测试用例的测试,如下所示。

它与以下内容相同strncmp()

  1. 它不区分大小写。
  2. 如果任一字符串为空 ptr,则行为不是未定义的(它是明确定义的)。strncmp()如果任一字符串为空 ptr,则常规具有未定义的行为(请参阅: https ://en.cppreference.com/w/cpp/string/byte/strncmp )。
  3. 如果任一输入字符串是ptr ,它将INT_MIN作为特殊标记错误值返回。NULL

限制:请注意,此代码仅适用于原始 7 位 ASCII 字符集(十进制值 0 到 127,包括在内),不适用于unicode字符,例如 unicode 字符编码UTF-8(最流行的)、UTF-16、和UTF-32

这里只是代码(没有评论):

int strncmpci(const char * str1, const char * str2, size_t num)
{
    int ret_code = 0;
    size_t chars_compared = 0;

    if (!str1 || !str2)
    {
        ret_code = INT_MIN;
        return ret_code;
    }

    while ((chars_compared < num) && (*str1 || *str2))
    {
        ret_code = tolower((int)(*str1)) - tolower((int)(*str2));
        if (ret_code != 0)
        {
            break;
        }
        chars_compared++;
        str1++;
        str2++;
    }

    return ret_code;
}

完全注释版本:

/// \brief      Perform a case-insensitive string compare (`strncmp()` case-insensitive) to see
///             if two C-strings are equal.
/// \note       1. Identical to `strncmp()` except:
///               1. It is case-insensitive.
///               2. The behavior is NOT undefined (it is well-defined) if either string is a null
///               ptr. Regular `strncmp()` has undefined behavior if either string is a null ptr
///               (see: https://en.cppreference.com/w/cpp/string/byte/strncmp).
///               3. It returns `INT_MIN` as a special sentinel value for certain errors.
///             - Posted as an answer here: https://stackoverflow.com/a/55293507/4561887.
///               - Aided/inspired, in part, by `strcicmp()` here:
///                 https://stackoverflow.com/a/5820991/4561887.
/// \param[in]  str1        C string 1 to be compared.
/// \param[in]  str2        C string 2 to be compared.
/// \param[in]  num         max number of chars to compare
/// \return     A comparison code (identical to `strncmp()`, except with the addition
///             of `INT_MIN` as a special sentinel value):
///
///             INT_MIN (usually -2147483648 for int32_t integers)  Invalid arguments (one or both
///                      of the input strings is a NULL pointer).
///             <0       The first character that does not match has a lower value in str1 than
///                      in str2.
///              0       The contents of both strings are equal.
///             >0       The first character that does not match has a greater value in str1 than
///                      in str2.
int strncmpci(const char * str1, const char * str2, size_t num)
{
    int ret_code = 0;
    size_t chars_compared = 0;

    // Check for NULL pointers
    if (!str1 || !str2)
    {
        ret_code = INT_MIN;
        return ret_code;
    }

    // Continue doing case-insensitive comparisons, one-character-at-a-time, of `str1` to `str2`, so
    // long as 1st: we have not yet compared the requested number of chars, and 2nd: the next char
    // of at least *one* of the strings is not zero (the null terminator for a C-string), meaning
    // that string still has more characters in it.
    // Note: you MUST check `(chars_compared < num)` FIRST or else dereferencing (reading) `str1` or
    // `str2` via `*str1` and `*str2`, respectively, is undefined behavior if you are reading one or
    // both of these C-strings outside of their array bounds.
    while ((chars_compared < num) && (*str1 || *str2))
    {
        ret_code = tolower((int)(*str1)) - tolower((int)(*str2));
        if (ret_code != 0)
        {
            // The 2 chars just compared don't match
            break;
        }
        chars_compared++;
        str1++;
        str2++;
    }

    return ret_code;
}

测试代码:

从我的eRCaGuy_hello_world存储库下载完整的示例代码和单元测试:“ strncmpci.c”

(这只是一个片段)

int main()
{
    printf("-----------------------\n"
           "String Comparison Tests\n"
           "-----------------------\n\n");

    int num_failures_expected = 0;

    printf("INTENTIONAL UNIT TEST FAILURE to show what a unit test failure looks like!\n");
    EXPECT_EQUALS(strncmpci("hey", "HEY", 3), 'h' - 'H');
    num_failures_expected++;
    printf("------ beginning ------\n\n");


    const char * str1;
    const char * str2;
    size_t n;

    // NULL ptr checks
    EXPECT_EQUALS(strncmpci(NULL, "", 0), INT_MIN);
    EXPECT_EQUALS(strncmpci("", NULL, 0), INT_MIN);
    EXPECT_EQUALS(strncmpci(NULL, NULL, 0), INT_MIN);
    EXPECT_EQUALS(strncmpci(NULL, "", 10), INT_MIN);
    EXPECT_EQUALS(strncmpci("", NULL, 10), INT_MIN);
    EXPECT_EQUALS(strncmpci(NULL, NULL, 10), INT_MIN);

    EXPECT_EQUALS(strncmpci("", "", 0), 0);
    EXPECT_EQUALS(strncmp("", "", 0), 0);

    str1 = "";
    str2 = "";
    n = 0;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 0);

    str1 = "hey";
    str2 = "HEY";
    n = 0;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 0);

    str1 = "hey";
    str2 = "HEY";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 'h' - 'H');

    str1 = "heY";
    str2 = "HeY";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 'h' - 'H');

    str1 = "hey";
    str2 = "HEdY";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 'y' - 'd');
    EXPECT_EQUALS(strncmp(str1, str2, n), 'h' - 'H');

    str1 = "heY";
    str2 = "hEYd";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 'e' - 'E');

    str1 = "heY";
    str2 = "heyd";
    n = 6;
    EXPECT_EQUALS(strncmpci(str1, str2, n), -'d');
    EXPECT_EQUALS(strncmp(str1, str2, n), 'Y' - 'y');

    str1 = "hey";
    str2 = "hey";
    n = 6;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 0);

    str1 = "hey";
    str2 = "heyd";
    n = 6;
    EXPECT_EQUALS(strncmpci(str1, str2, n), -'d');
    EXPECT_EQUALS(strncmp(str1, str2, n), -'d');

    str1 = "hey";
    str2 = "heyd";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 0);

    str1 = "hEY";
    str2 = "heyYOU";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 'E' - 'e');

    str1 = "hEY";
    str2 = "heyYOU";
    n = 10;
    EXPECT_EQUALS(strncmpci(str1, str2, n), -'y');
    EXPECT_EQUALS(strncmp(str1, str2, n), 'E' - 'e');

    str1 = "hEYHowAre";
    str2 = "heyYOU";
    n = 10;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 'h' - 'y');
    EXPECT_EQUALS(strncmp(str1, str2, n), 'E' - 'e');

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE TO MEET YOU.,;", 100), 0);
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "NICE TO MEET YOU.,;", 100), 'n' - 'N');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice to meet you.,;", 100), 0);

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE TO UEET YOU.,;", 100), 'm' - 'u');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice to uEET YOU.,;", 100), 'm' - 'u');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice to UEET YOU.,;", 100), 'm' - 'U');

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE TO MEET YOU.,;", 5), 0);
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "NICE TO MEET YOU.,;", 5), 'n' - 'N');

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE eo UEET YOU.,;", 5), 0);
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice eo uEET YOU.,;", 5), 0);

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE eo UEET YOU.,;", 100), 't' - 'e');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice eo uEET YOU.,;", 100), 't' - 'e');

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "nice-eo UEET YOU.,;", 5), ' ' - '-');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice-eo UEET YOU.,;", 5), ' ' - '-');


    if (globals.error_count == num_failures_expected)
    {
        printf(ANSI_COLOR_GRN "All unit tests passed!" ANSI_COLOR_OFF "\n");
    }
    else
    {
        printf(ANSI_COLOR_RED "FAILED UNIT TESTS! NUMBER OF UNEXPECTED FAILURES = %i"
            ANSI_COLOR_OFF "\n", globals.error_count - num_failures_expected);
    }

    assert(globals.error_count == num_failures_expected);
    return globals.error_count;
}

样本输出:

$ gcc -Wall -Wextra -Werror -ggdb -std=c11 -o ./bin/tmp strncmpci.c && ./bin/tmp
-----------------------
String Comparison Tests
-----------------------

INTENTIONAL UNIT TEST FAILURE to show what a unit test failure looks like!
FAILED at line 250 in function main! strncmpci("hey", "HEY", 3) != 'h' - 'H'
  a: strncmpci("hey", "HEY", 3) is 0
  b: 'h' - 'H' is 32

------ beginning ------

All unit tests passed!

参考:

  1. 这里的这个问题和其他答案提供了灵感并提供了一些见解(C 中不区分大小写的字符串组合
  2. http://www.cplusplus.com/reference/cstring/strncmp/
  3. https://en.wikipedia.org/wiki/ASCII
  4. https://en.cppreference.com/w/c/language/operator_precedence
  5. 我为修复上面的部分代码所做的 未定义行为研究(请参阅下面的评论):
    1. 谷歌搜索“c undefined behavior reading outside array bounds”
    2. 是否在其绑定的未定义行为之外访问全局数组?
    3. https://en.cppreference.com/w/cpp/language/ub - 另见底部许多非常棒的“外部链接”!
    4. 1/3:http ://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html
    5. 2/3:https ://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html
    6. 3/3:https ://blog.llvm.org/2011/05/what-every-c-programmer-should-know_21.html
    7. https://blog.regehr.org/archives/213
    8. https://www.geeksforgeeks.org/accessing-array-bounds-ccpp/

进一步研究的主题

  1. (注意:这是 C++,不是 C)Unicode 字符的小写
  2. OnlineGDB上的 tolower_tests.c:https ://onlinegdb.com/HyZieXcew

去做:

  1. 制作此代码的版本,该版本也适用于 Unicode 的UTF-8实现(字符编码)!
于 2019-03-22T05:40:07.147 回答
1

简单的解决方案:

int str_case_ins_cmp(const char* a, const char* b) {
  int rc;

  while (1) {
    rc = tolower((unsigned char)*a) - tolower((unsigned char)*b);
    if (rc || !*a) {
      break;
    }

    ++a;
    ++b;
  }

  return rc;
}
于 2019-12-14T10:33:47.327 回答
1

你可以从这里得到一个想法,如何实现一个高效的,如果你在图书馆里没有任何想法

它对所有 256 个字符使用一个表。

  • 在该表中,除字母外的所有字符都使用了它的 ascii 代码。
  • 对于大写字母代码 - 小写符号的表格列表代码。

然后我们只需要遍历一个字符串并比较给定字符的表格单元格:

const char *cm = charmap,
        *us1 = (const char *)s1,
        *us2 = (const char *)s2;
while (cm[*us1] == cm[*us2++])
    if (*us1++ == '\0')
        return (0);
return (cm[*us1] - cm[*--us2]);
于 2015-12-27T03:36:42.743 回答
0
static int ignoreCaseComp (const char *str1, const char *str2, int length)
{
    int k;
    for (k = 0; k < length; k++)
    {

        if ((str1[k] | 32) != (str2[k] | 32))
            break;
    }

    if (k != length)
        return 1;
    return 0;
}

参考

于 2016-02-14T10:17:06.880 回答
0

如果我们有一个以空字符结尾的字符:

   bool striseq(const char* s1,const char* s2){ 
     for(;*s1;){ 
       if(tolower(*s1++)!=tolower(*s2++)) 
         return false; 
      } 
      return *s1 == *s2;
    }

或使用按位运算的此版本:

    int striseq(const char* s1,const char* s2)
       {for(;*s1;) if((*s1++|32)!=(*s2++|32)) return 0; return *s1 == *s2;}

我不确定这是否适用于符号,我没有在那里测试过,但适用于字母。

于 2022-02-22T18:11:03.490 回答
-1
int strcmpInsensitive(char* a, char* b)
{
    return strcmp(lowerCaseWord(a), lowerCaseWord(b));
}

char* lowerCaseWord(char* a)
{
    char *b=new char[strlen(a)];
    for (int i = 0; i < strlen(a); i++)
    {
        b[i] = tolower(a[i]);   
    }
    return b;
}

祝你好运

Edit-lowerCaseWord 函数获取一个 char* 变量,并返回这个 char* 的小写值。例如 char* 值的“AbCdE”将返回“abcde”。

基本上,它所做的是将两个 char* 变量转换为小写后,并在它们上使用 strcmp 函数。

例如 - 如果我们为“AbCdE”和“ABCDE”的值调用 strcmpInsensitive 函数,它将首先以小写(“abcde”)返回两个值,然后对它们执行 strcmp 函数。

于 2016-01-21T21:51:08.240 回答