char*
我有两个要比较的邮政编码,忽略大小写。有这样做的功能吗?
还是我必须遍历每个使用tolower
函数然后进行比较?
知道这个函数将如何对字符串中的数字做出反应
谢谢
char*
我有两个要比较的邮政编码,忽略大小写。有这样做的功能吗?
还是我必须遍历每个使用tolower
函数然后进行比较?
知道这个函数将如何对字符串中的数字做出反应
谢谢
C 标准中没有执行此操作的函数。符合 POSIX 的 Unix 系统必须strcasecmp
在 header 中包含strings.h
;微软系统有stricmp
. 要在便携式方面,请编写自己的:
int strcicmp(char const *a, char const *b)
{
for (;; a++, b++) {
int d = tolower((unsigned char)*a) - tolower((unsigned char)*b);
if (d != 0 || !*a)
return d;
}
}
但请注意,这些解决方案都不适用于 UTF-8 字符串,只有 ASCII 字符串。
看看strcasecmp()
在strings.h
.
我发现内置的这种方法命名为 from 其中包含标准 header 的附加字符串函数。
这是相关的签名:
int strcasecmp(const char *, const char *);
int strncasecmp(const char *, const char *, size_t);
我还发现它在 xnu 内核 (osfmk/device/subrs.c) 中是同义词,并且在以下代码中实现,因此与原始 strcmp 函数相比,您不会期望在数量上有任何行为变化。
tolower(unsigned char ch) {
if (ch >= 'A' && ch <= 'Z')
ch = 'a' + (ch - 'A');
return ch;
}
int strcasecmp(const char *s1, const char *s2) {
const unsigned char *us1 = (const u_char *)s1,
*us2 = (const u_char *)s2;
while (tolower(*us1) == tolower(*us2++))
if (*us1++ == '\0')
return (0);
return (tolower(*us1) - tolower(*--us2));
}
比较小写还是大写?(很常见的问题)
下面两者都将返回 0strcicmpL("A", "a")
和strcicmpU("A", "a")
。
然而strcicmpL("A", "_")
,并且strcicmpU("A", "_")
可以返回不同的签名结果,'_'
通常在大写和小写字母之间。
这会影响与 一起使用时的排序顺序qsort(..., ..., ..., strcicmp)
。非标准库 C 函数,如常用的 stricmp()
或strcasecmp()
往往定义良好并喜欢通过小写进行比较。然而变化是存在的。
int strcicmpL(char const *a, char const *b) {
while (*b) {
int d = tolower(*a) - tolower(*b);
if (d) {
return d;
}
a++;
b++;
}
return tolower(*a);
}
int strcicmpU(char const *a, char const *b) {
while (*b) {
int d = toupper(*a) - toupper(*b);
if (d) {
return d;
}
a++;
b++;
}
return toupper(*a);
}
char
可以有一个负值。(并不罕见)
touppper(int)
并tolower(int)
为unsigned char
值和负数指定EOF
。此外,strcmp()
返回结果,就好像每个都char
被转换为一样unsigned char
,无论char
是有符号还是无符号。
tolower(*a); // Potential UB
tolower((unsigned char) *a); // Correct (Almost - see following)
char
可以有负值而不是 2 的补码。(稀有的)
由于-0
位模式应被解释为unsigned char
. 要正确处理所有整数编码,请先更改指针类型。
// tolower((unsigned char) *a);
tolower(*(const unsigned char *)a); // Correct
语言环境(不太常见)
尽管使用 ASCII 代码 (0-127) 的字符集无处不在,但其余代码往往存在特定于语言环境的问题。因此strcasecmp("\xE4", "a")
可能会在一个系统上返回 0,而在另一个系统上返回非零。
Unicode(未来的方式)
如果解决方案需要处理的不仅仅是 ASCII,请考虑使用unicode_strcicmp()
. 由于 C 库不提供这样的功能,因此建议使用某个替代库中的预编码函数。自己写 unicode_strcicmp()
是一项艰巨的任务。
所有字母都映射一个低到一个高吗?(迂腐)
[AZ] 与 [az] 一对一映射,但各种语言环境将各种小写字符映射到一个大写字符,反之亦然。此外,一些大写字符可能缺少等效的小写字符,反之亦然。
tolower()
这要求代码通过和进行隐蔽tolower()
。
int d = tolower(toupper(*a)) - tolower(toupper(*b));
同样,如果代码确实tolower(toupper(*a))
与toupper(tolower(*a))
.
可移植性
@B。Nadolson建议避免自己滚动strcicmp()
,这是合理的,除非代码需要高度等效的可移植功能。
下面是一种甚至比某些系统提供的功能执行得更快的方法。它通过使用与'\0'
. 您的结果可能会有所不同。
static unsigned char low1[UCHAR_MAX + 1] = {
0, 1, 2, 3, ...
'@', 'a', 'b', 'c', ... 'z', `[`, ... // @ABC... Z[...
'`', 'a', 'b', 'c', ... 'z', `{`, ... // `abc... z{...
}
static unsigned char low2[UCHAR_MAX + 1] = {
// v--- Not zero, but A which matches none in `low1[]`
'A', 1, 2, 3, ...
'@', 'a', 'b', 'c', ... 'z', `[`, ...
'`', 'a', 'b', 'c', ... 'z', `{`, ...
}
int strcicmp_ch(char const *a, char const *b) {
// compare using tables that differ slightly.
while (low1[*(const unsigned char *)a] == low2[*(const unsigned char *)b]) {
a++;
b++;
}
// Either strings differ or null character detected.
// Perform subtraction using same table.
return (low1[*(const unsigned char *)a] - low1[*(const unsigned char *)b]);
}
我会用stricmp()
. 它比较两个字符串而不考虑大小写。
请注意,在某些情况下,将字符串转换为小写会更快。
正如其他人所说,没有适用于所有系统的便携式功能。您可以使用简单的方法部分规避此问题ifdef
:
#include <stdio.h>
#ifdef _WIN32
#include <string.h>
#define strcasecmp _stricmp
#else // assuming POSIX or BSD compliant system
#include <strings.h>
#endif
int main() {
printf("%d", strcasecmp("teSt", "TEst"));
}
我并不是这里最受支持的答案的粉丝(部分原因是它似乎不正确,因为它应该continue
在任一字符串中读取空终止符 - 但不是同时读取两个字符串 - 而且它不这样做),所以我自己写了。
strncmp()
,并且已经过大量测试用例的测试,如下所示。它与以下内容相同strncmp()
:
strncmp()
如果任一字符串为空 ptr,则常规具有未定义的行为(请参阅: https ://en.cppreference.com/w/cpp/string/byte/strncmp )。INT_MIN
作为特殊标记错误值返回。NULL
限制:请注意,此代码仅适用于原始 7 位 ASCII 字符集(十进制值 0 到 127,包括在内),不适用于unicode字符,例如 unicode 字符编码UTF-8(最流行的)、UTF-16、和UTF-32。
这里只是代码(没有评论):
int strncmpci(const char * str1, const char * str2, size_t num)
{
int ret_code = 0;
size_t chars_compared = 0;
if (!str1 || !str2)
{
ret_code = INT_MIN;
return ret_code;
}
while ((chars_compared < num) && (*str1 || *str2))
{
ret_code = tolower((int)(*str1)) - tolower((int)(*str2));
if (ret_code != 0)
{
break;
}
chars_compared++;
str1++;
str2++;
}
return ret_code;
}
完全注释版本:
/// \brief Perform a case-insensitive string compare (`strncmp()` case-insensitive) to see
/// if two C-strings are equal.
/// \note 1. Identical to `strncmp()` except:
/// 1. It is case-insensitive.
/// 2. The behavior is NOT undefined (it is well-defined) if either string is a null
/// ptr. Regular `strncmp()` has undefined behavior if either string is a null ptr
/// (see: https://en.cppreference.com/w/cpp/string/byte/strncmp).
/// 3. It returns `INT_MIN` as a special sentinel value for certain errors.
/// - Posted as an answer here: https://stackoverflow.com/a/55293507/4561887.
/// - Aided/inspired, in part, by `strcicmp()` here:
/// https://stackoverflow.com/a/5820991/4561887.
/// \param[in] str1 C string 1 to be compared.
/// \param[in] str2 C string 2 to be compared.
/// \param[in] num max number of chars to compare
/// \return A comparison code (identical to `strncmp()`, except with the addition
/// of `INT_MIN` as a special sentinel value):
///
/// INT_MIN (usually -2147483648 for int32_t integers) Invalid arguments (one or both
/// of the input strings is a NULL pointer).
/// <0 The first character that does not match has a lower value in str1 than
/// in str2.
/// 0 The contents of both strings are equal.
/// >0 The first character that does not match has a greater value in str1 than
/// in str2.
int strncmpci(const char * str1, const char * str2, size_t num)
{
int ret_code = 0;
size_t chars_compared = 0;
// Check for NULL pointers
if (!str1 || !str2)
{
ret_code = INT_MIN;
return ret_code;
}
// Continue doing case-insensitive comparisons, one-character-at-a-time, of `str1` to `str2`, so
// long as 1st: we have not yet compared the requested number of chars, and 2nd: the next char
// of at least *one* of the strings is not zero (the null terminator for a C-string), meaning
// that string still has more characters in it.
// Note: you MUST check `(chars_compared < num)` FIRST or else dereferencing (reading) `str1` or
// `str2` via `*str1` and `*str2`, respectively, is undefined behavior if you are reading one or
// both of these C-strings outside of their array bounds.
while ((chars_compared < num) && (*str1 || *str2))
{
ret_code = tolower((int)(*str1)) - tolower((int)(*str2));
if (ret_code != 0)
{
// The 2 chars just compared don't match
break;
}
chars_compared++;
str1++;
str2++;
}
return ret_code;
}
从我的eRCaGuy_hello_world存储库下载完整的示例代码和单元测试:“ strncmpci.c”:
(这只是一个片段)
int main()
{
printf("-----------------------\n"
"String Comparison Tests\n"
"-----------------------\n\n");
int num_failures_expected = 0;
printf("INTENTIONAL UNIT TEST FAILURE to show what a unit test failure looks like!\n");
EXPECT_EQUALS(strncmpci("hey", "HEY", 3), 'h' - 'H');
num_failures_expected++;
printf("------ beginning ------\n\n");
const char * str1;
const char * str2;
size_t n;
// NULL ptr checks
EXPECT_EQUALS(strncmpci(NULL, "", 0), INT_MIN);
EXPECT_EQUALS(strncmpci("", NULL, 0), INT_MIN);
EXPECT_EQUALS(strncmpci(NULL, NULL, 0), INT_MIN);
EXPECT_EQUALS(strncmpci(NULL, "", 10), INT_MIN);
EXPECT_EQUALS(strncmpci("", NULL, 10), INT_MIN);
EXPECT_EQUALS(strncmpci(NULL, NULL, 10), INT_MIN);
EXPECT_EQUALS(strncmpci("", "", 0), 0);
EXPECT_EQUALS(strncmp("", "", 0), 0);
str1 = "";
str2 = "";
n = 0;
EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
EXPECT_EQUALS(strncmp(str1, str2, n), 0);
str1 = "hey";
str2 = "HEY";
n = 0;
EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
EXPECT_EQUALS(strncmp(str1, str2, n), 0);
str1 = "hey";
str2 = "HEY";
n = 3;
EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
EXPECT_EQUALS(strncmp(str1, str2, n), 'h' - 'H');
str1 = "heY";
str2 = "HeY";
n = 3;
EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
EXPECT_EQUALS(strncmp(str1, str2, n), 'h' - 'H');
str1 = "hey";
str2 = "HEdY";
n = 3;
EXPECT_EQUALS(strncmpci(str1, str2, n), 'y' - 'd');
EXPECT_EQUALS(strncmp(str1, str2, n), 'h' - 'H');
str1 = "heY";
str2 = "hEYd";
n = 3;
EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
EXPECT_EQUALS(strncmp(str1, str2, n), 'e' - 'E');
str1 = "heY";
str2 = "heyd";
n = 6;
EXPECT_EQUALS(strncmpci(str1, str2, n), -'d');
EXPECT_EQUALS(strncmp(str1, str2, n), 'Y' - 'y');
str1 = "hey";
str2 = "hey";
n = 6;
EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
EXPECT_EQUALS(strncmp(str1, str2, n), 0);
str1 = "hey";
str2 = "heyd";
n = 6;
EXPECT_EQUALS(strncmpci(str1, str2, n), -'d');
EXPECT_EQUALS(strncmp(str1, str2, n), -'d');
str1 = "hey";
str2 = "heyd";
n = 3;
EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
EXPECT_EQUALS(strncmp(str1, str2, n), 0);
str1 = "hEY";
str2 = "heyYOU";
n = 3;
EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
EXPECT_EQUALS(strncmp(str1, str2, n), 'E' - 'e');
str1 = "hEY";
str2 = "heyYOU";
n = 10;
EXPECT_EQUALS(strncmpci(str1, str2, n), -'y');
EXPECT_EQUALS(strncmp(str1, str2, n), 'E' - 'e');
str1 = "hEYHowAre";
str2 = "heyYOU";
n = 10;
EXPECT_EQUALS(strncmpci(str1, str2, n), 'h' - 'y');
EXPECT_EQUALS(strncmp(str1, str2, n), 'E' - 'e');
EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE TO MEET YOU.,;", 100), 0);
EXPECT_EQUALS(strncmp( "nice to meet you.,;", "NICE TO MEET YOU.,;", 100), 'n' - 'N');
EXPECT_EQUALS(strncmp( "nice to meet you.,;", "nice to meet you.,;", 100), 0);
EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE TO UEET YOU.,;", 100), 'm' - 'u');
EXPECT_EQUALS(strncmp( "nice to meet you.,;", "nice to uEET YOU.,;", 100), 'm' - 'u');
EXPECT_EQUALS(strncmp( "nice to meet you.,;", "nice to UEET YOU.,;", 100), 'm' - 'U');
EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE TO MEET YOU.,;", 5), 0);
EXPECT_EQUALS(strncmp( "nice to meet you.,;", "NICE TO MEET YOU.,;", 5), 'n' - 'N');
EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE eo UEET YOU.,;", 5), 0);
EXPECT_EQUALS(strncmp( "nice to meet you.,;", "nice eo uEET YOU.,;", 5), 0);
EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE eo UEET YOU.,;", 100), 't' - 'e');
EXPECT_EQUALS(strncmp( "nice to meet you.,;", "nice eo uEET YOU.,;", 100), 't' - 'e');
EXPECT_EQUALS(strncmpci("nice to meet you.,;", "nice-eo UEET YOU.,;", 5), ' ' - '-');
EXPECT_EQUALS(strncmp( "nice to meet you.,;", "nice-eo UEET YOU.,;", 5), ' ' - '-');
if (globals.error_count == num_failures_expected)
{
printf(ANSI_COLOR_GRN "All unit tests passed!" ANSI_COLOR_OFF "\n");
}
else
{
printf(ANSI_COLOR_RED "FAILED UNIT TESTS! NUMBER OF UNEXPECTED FAILURES = %i"
ANSI_COLOR_OFF "\n", globals.error_count - num_failures_expected);
}
assert(globals.error_count == num_failures_expected);
return globals.error_count;
}
$ gcc -Wall -Wextra -Werror -ggdb -std=c11 -o ./bin/tmp strncmpci.c && ./bin/tmp ----------------------- String Comparison Tests ----------------------- INTENTIONAL UNIT TEST FAILURE to show what a unit test failure looks like! FAILED at line 250 in function main! strncmpci("hey", "HEY", 3) != 'h' - 'H' a: strncmpci("hey", "HEY", 3) is 0 b: 'h' - 'H' is 32 ------ beginning ------ All unit tests passed!
简单的解决方案:
int str_case_ins_cmp(const char* a, const char* b) {
int rc;
while (1) {
rc = tolower((unsigned char)*a) - tolower((unsigned char)*b);
if (rc || !*a) {
break;
}
++a;
++b;
}
return rc;
}
你可以从这里得到一个想法,如何实现一个高效的,如果你在图书馆里没有任何想法
它对所有 256 个字符使用一个表。
然后我们只需要遍历一个字符串并比较给定字符的表格单元格:
const char *cm = charmap,
*us1 = (const char *)s1,
*us2 = (const char *)s2;
while (cm[*us1] == cm[*us2++])
if (*us1++ == '\0')
return (0);
return (cm[*us1] - cm[*--us2]);
static int ignoreCaseComp (const char *str1, const char *str2, int length)
{
int k;
for (k = 0; k < length; k++)
{
if ((str1[k] | 32) != (str2[k] | 32))
break;
}
if (k != length)
return 1;
return 0;
}
如果我们有一个以空字符结尾的字符:
bool striseq(const char* s1,const char* s2){
for(;*s1;){
if(tolower(*s1++)!=tolower(*s2++))
return false;
}
return *s1 == *s2;
}
或使用按位运算的此版本:
int striseq(const char* s1,const char* s2)
{for(;*s1;) if((*s1++|32)!=(*s2++|32)) return 0; return *s1 == *s2;}
我不确定这是否适用于符号,我没有在那里测试过,但适用于字母。
int strcmpInsensitive(char* a, char* b)
{
return strcmp(lowerCaseWord(a), lowerCaseWord(b));
}
char* lowerCaseWord(char* a)
{
char *b=new char[strlen(a)];
for (int i = 0; i < strlen(a); i++)
{
b[i] = tolower(a[i]);
}
return b;
}
祝你好运
Edit-lowerCaseWord 函数获取一个 char* 变量,并返回这个 char* 的小写值。例如 char* 值的“AbCdE”将返回“abcde”。
基本上,它所做的是将两个 char* 变量转换为小写后,并在它们上使用 strcmp 函数。
例如 - 如果我们为“AbCdE”和“ABCDE”的值调用 strcmpInsensitive 函数,它将首先以小写(“abcde”)返回两个值,然后对它们执行 strcmp 函数。