2

我正在使用此调用加载网站 HTML -

    NSMutableURLRequest *request = [NSMutableURLRequest requestWithURL:url];
    [request setValue:@"utf-8" forHTTPHeaderField:@"Accept-Encoding"];
    [request setValue:@"text/html" forHTTPHeaderField:@"Accept"];
    [NSURLConnection sendAsynchronousRequest:request
                                       queue:[NSOperationQueue currentQueue]
                           completionHandler:^(NSURLResponse *response, NSData *data, NSError *error) { ... }

然后,要将 NSData 转换为 NSString,我需要知道编码,所以我调用 -

NSString *textEncoding = [response textEncodingName];

来自代码块,但它在不会指定“Content-Encoding”标头字段的网站上返回 nil。

如果我不知道编码,[[NSString alloc] initWithData:data encoding:responseEncoding]就不会给我可读的 HTML。

如何检测不发送“Content-Encoding”标头字段的网站的正确编码?

4

2 回答 2

2

可以尝试不同的编码并查看哪一种结果具有可读文本 -

static int encodingPriority[] = {
    NSUTF8StringEncoding,
    NSASCIIStringEncoding,
    NSISOLatin1StringEncoding,
    NSISOLatin2StringEncoding,
    NSUnicodeStringEncoding,
    NSWindowsCP1251StringEncoding,
    NSWindowsCP1252StringEncoding,
    NSWindowsCP1253StringEncoding,
    NSWindowsCP1254StringEncoding,
    NSWindowsCP1250StringEncoding,
    NSNEXTSTEPStringEncoding,
    NSJapaneseEUCStringEncoding,
    NSNonLossyASCIIStringEncoding,
    NSShiftJISStringEncoding,          /* kCFStringEncodingDOSJapanese */
    NSISO2022JPStringEncoding,        /* ISO 2022 Japanese encoding for e-mail */
    NSMacOSRomanStringEncoding,
    NSUTF16BigEndianStringEncoding,
    NSUTF16LittleEndianStringEncoding,
    NSUTF32StringEncoding,
    NSUTF32BigEndianStringEncoding,
    NSUTF32LittleEndianStringEncoding
};

#define REQUIRED_HTML_STRING    @"<html"

- (NSString *)htmlStringForUnknownEncodingData:(NSData *)data detectedEncoding:(NSStringEncoding *)detectedEncoding
{
    NSStringEncoding encoding;
    NSString *html;

    for (int i = 0; i < sizeof(encodingPriority); i++) {
        encoding = encodingPriority[i];

        // try this encoding
        html = [[NSString alloc] initWithData:data encoding:encoding];

        // we need to find a text, because bad encoding will return an unreadable text
        if (html && [html rangeOfString:REQUIRED_HTML_STRING options:NSCaseInsensitiveSearch].location != NSNotFound) {
            *detectedEncoding = encoding;
            return html;
        }
    }
    return nil;
}

然后,要检测 NSData 中的 HTML 使用哪种编码,请调用 -

NSStringEncoding encoding;
html = [self htmlStringForUnknownEncodingData:data detectedEncoding:&encoding];

if (html)
    NSLog("Encoding detected!");
else
    NSLog("No encoding detected");
于 2013-07-17T14:38:45.677 回答
0

我尝试了@Kof 的代码。我注意到我从响应中得到的编码是 utf-8。如果[[NSString alloc] initWithData:data encoding:@"utf-8"]直接设置encoding为,肯定会返回null。这是因为编码接受类型NSStringEncodingNSENUM. 如果您尝试[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding,它将返回结果。

于 2016-10-19T07:14:21.347 回答