0

I use this code to retrive the first image of a feed... the htmlString contains html tags and in same cases I can get correctly the first image but in other cases i get a nil NSString. I don't understand why. I'm sure that the htmlString contains one image. For example for this String I can't get correctly the first image.

Example: CultOfMac ha segnalato la disponibilità sul Mac App Store delle prime applicazioni sviluppate appositamente per OS X 10.7 Lion. In passato situazioni come questa hanno preceduto di qualche ora il lancio di nuovi prodotti, basti pensare al rilascio di iOS 4.2.1 per iPhone e iPad, primo firmware che ha unificato la numerazione delle versioni di sistema di questi dispositivi. OS X Lion è più vicino al rilascio?</p> <p><img class="aligncenter size-full wp-image-21789" title="mac-app-store_t" src="http://static.slidetomac.com/wp-content/uploads/2011/07/mac-app-store_t.jpg" alt="" width="507" height="300" /></p> <p><span id="more-21780"></span></p> <p>Solo qualche giorno fa Apple....

The part of code that I need is: <img class="aligncenter size-full wp-image-21789" title="mac-app-store_t" src="http://static.slidetomac.com/wp-content/uploads/2011/07/mac-app-store_t.jpg" alt="" width="507" height="300" /

But I can't get correctly the url of the image... What's wrong in my code? thanks

- (NSString *)getFirstImage:(NSString *)htmlString{



    NSString *urlImage=nil;
    NSScanner *theScanner = [NSScanner scannerWithString:htmlString];
    // find start of IMG tag
    [theScanner scanUpToString:@"<img" intoString:nil];
    do {
        [theScanner scanUpToString:@"src" intoString:nil];
        NSCharacterSet *charset = [NSCharacterSet characterSetWithCharactersInString:@"\"'"];
        [theScanner scanUpToCharactersFromSet:charset intoString:nil];
        [theScanner scanCharactersFromSet:charset intoString:nil];
        [theScanner scanUpToCharactersFromSet:charset intoString:&urlImage];

        if([urlImage rangeOfString:@"imagebutton.gif"].location == NSNotFound) return urlImage;


    }while (![theScanner isAtEnd]  );


    if([theScanner isAtEnd]) return nil;
     return urlImage;
}
4

1 回答 1

1

What's wrong in my code?

You are using a scanning parser to parse HTML.

HTML parsing is very very hard; all of the problems of an XML parser combined with a pervasive lack of consistency.

Fortunately, HTML parsing is also very much a solved problem.

Use a proper HTML Parser. libxml2 has an HTML compliant mode.

There are a slew of question/answers on SO about HTML parsing.

于 2011-07-14T22:39:05.540 回答