这是关于Objective-C的问题。我编写了使用正则表达式获取整个 HTML 的程序。我已将程序上传到 GitHub。但是,会发生异常。

这个程序的目的是通过正则表达式匹配得到“og:image”。这是通过在 Facebook 中写入 URL 来显示的图像。要设置此图像,请使用 HTML 编写如下:

<meta property="og:image"

所以我编写了获取整个 HTML 并找到 og:image 部分的程序。代码如下:

// Web page address
NSURL *url = [NSURL URLWithString:textField.text];

// Get the web page HTML
NSString *string = 
[NSString stringWithContentsOfURL:url encoding:NSUTF8StringEncoding error:nil];

// prepare regular expression to find text
NSError *error   = nil;
NSRegularExpression *regexp =
[NSRegularExpression regularExpressionWithPattern:
 @"<meta property=\"og:image\" content=\".+\""

@try {
    // find by regular expression
    NSTextCheckingResult *match =
    [regexp firstMatchInString:string options:0 range:NSMakeRange(0, string.length)];

    // get the first result
    NSRange resultRange = [match rangeAtIndex:0];
    NSLog(@"match=%@", [string substringWithRange:resultRange]); 

    if (match) {

        // get the og:image URL from the find result
        NSRange urlRange = NSMakeRange(resultRange.location + 35, resultRange.length - 35 - 1);
        NSURL *urlOgImage = [NSURL URLWithString:[string substringWithRange:urlRange]];
        imageView.image = [UIImage imageWithData:[NSData dataWithContentsOfURL:urlOgImage]];

整个代码在 GitHub 中,如下所示:



  • success case:<a href="http://www.nicovideo.jp/watch/1343369790" rel="nofollow">http://www.nicovideo.jp/watch/1343369790

  • failure case:<a href="http://business.nikkeibp.co.jp/article/NBD/20120727/235043/?ST=pc" rel="nofollow">http://business.nikkeibp.co.jp/article/NBD/20120727/235043/?ST=pc

Screen shots is here: https://github.com/weed/p120728_GetOgImage/blob/master/readme.md

Why exception occurs? Please teach me. Thank you for your help.


2 回答 2


My friend kindly pointed about considering Character Encoding. The character encoding of first URL page is UTF-8, and the second one is EUC-JP.

With the code below I could get the og:image of second URL I showed above.

- (NSString *)encodedStringWithContentsOfURL:(NSURL *)url
    // Get the web page HTML
    NSData *data = [NSData dataWithContentsOfURL:url];

    // response
    int enc_arr[] = {
        NSUTF8StringEncoding,           // UTF-8
        NSShiftJISStringEncoding,       // Shift_JIS
        NSJapaneseEUCStringEncoding,    // EUC-JP
        NSISO2022JPStringEncoding,      // JIS
        NSUnicodeStringEncoding,        // Unicode
        NSASCIIStringEncoding           // ASCII
    NSString *data_str = nil;
    int max = sizeof(enc_arr) / sizeof(enc_arr[0]);
    for (int i=0; i<max; i++) {
        data_str = [
               [NSString alloc]
               initWithData : data
               encoding : enc_arr[i]
        if (data_str!=nil) {
    return data_str;    

I made the check library of character encoding named NSString+Encode. The whole code is in GitHub:


于 2012-07-28T10:00:02.060 回答

It looks like your regular expression is not matching the result for the second page, have you tested the html source of that page with your regular expression in a regex tester?

Something like this should do the trick: http://regexpal.com/

于 2012-07-28T09:07:43.220 回答