Even there is already an accepted answer I want to warn of using strlen(), even in this case it might be without any problem. There are a differences between NSString and C-Strings.
A. -length
(NSString
) and strlen()
has different semantics:
NSString
is not(!) \0-terminated, but length based. It can store \0 characters. It is very easy to get different length, if there is a \0 character in the string instance:
NSString *sentence = @"Amin\0Negm";
NSLog( @"length %ld", [sentence length]); // 9
const char *chars = [sentence cStringUsingEncoding:NSUTF8StringEncoding];
size_t length= strlen(chars);
NSLog(@"strlen %ld", (long)length); // 4
length 9
strlen 4
But -UTF8String
and even the used -cStringUsingEnocding:
(both NSString
) copy out the whole string stored in the string instance. (I think in case of -cStringUsingEncoding
it is misleading, because standard string functions like strlen()
always uses the first \0 as the termination of strings.)
B. In UTF8 a character can have multibytes. A char in C is one byte. (With byte not in the meaning of 8 bits, but smallest addressable unit.)
NSString *sentence = @"Αmin Negm";
NSLog( @"length %ld", [sentence length]);
const char *chars = [sentence UTF8String];
size_t length= strlen(chars);
NSLog(@"strlen %ld", (long)length);
length 9
strlen 10
WTF happened here? The "A" of Amin is no latin capital letter A but a greek capital letter Alpha. In UTF8 this takes two bytes and for pure C's strlen there are two characters!
NSLog(@"%x-%x %x-%x", 'A', 'm', (unsigned char)*chars, (unsigned char)*(chars+1) );
41-6d ce-91
The first two numbers are the codes for 'A', 'm', the second two numbers are the UTF8 code for greek capital letter Alpha (CE 91).
I do not think, that it is a good idea to simply change from NSString
to char *
without good reason and a complete understanding of the problems. If you do not expect such characters, use NSASCIIStringEncoding
. If you expect such characters check your code again and again … or read C.
C. C supports wide characters. This is similiar to Mac OS' unichar, but typed wchar_t
. There are string functions for wchar_t in wchar.h.
NSString *sentence = @"Αmin Negm";
NSLog( @"length %ld", [sentence length]);
wchar_t wchars[128]; // take care of the size
wchar_t *wchar = wchars;
for (NSUInteger index = 0; index < [sentence length]; index++)
{
*wchar++ = [sentence characterAtIndex:index];
}
*wchar = '\0';
NSLog(@"widestrlen %ld", wcslen(wchars));
length 9
widestrlen 9
D. Obviously you want to iterate through the string. The common pattern in pure C is not to use an index and to compare it to the length and definitly not to to strlen()
in every loop, because it produces high costs. (C strings are not length based so the whole string has to be scanned over and over.) You simply increment the pointer to the next char:
char letter;
while ( (letter = *chars++) ) {…}
or
do
{
// *chars points to the actual char
} while (*char++);