好的,所以我正在开发一个网络爬虫,它可以获取网页并将它们转换为文本段落。为了删除标签本身,我在 Stack Overflow 上找到了这个:
- (NSString *) stripTags:(NSString *)str
{
NSMutableString *ms = [NSMutableString stringWithCapacity:[str length]];
NSScanner *scanner = [NSScanner scannerWithString:str];
[scanner setCharactersToBeSkipped:nil];
NSString *s = nil;
while (![scanner isAtEnd])
{
[scanner scanUpToString:@"<" intoString:&s];
if (s != nil)
[ms appendString:s];
[scanner scanUpToString:@">" intoString:NULL];
if (![scanner isAtEnd])
[scanner setScanLocation:[scanner scanLocation]+1];
s = nil;
}
return ms;
}
它有效,但是,它只删除tags,而不是 script 和 style 标签之间的内容(显然我不希望删除所有标签之间的内容,因为这会导致空字符串)。
有什么办法可以专门截断脚本和样式标签?
提前非常感谢。
编辑:
我尝试将我的代码更改为:
- (NSString *) stripTags:(NSString *)str
{
NSMutableString *ms = [NSMutableString stringWithCapacity:[str length]];
NSScanner *scanner = [NSScanner scannerWithString:str];
[scanner setCharactersToBeSkipped:nil];
NSString *s = nil;
while (![scanner isAtEnd])
{
[scanner scanUpToString:@"<script" intoString:&s];
if (s != nil)
[ms appendString:s];
[scanner scanUpToString:@"script>" intoString:NULL];
if (![scanner isAtEnd])
[scanner setScanLocation:[scanner scanLocation]+1];
[scanner scanUpToString:@"<" intoString:&s];
if (s != nil)
[ms appendString:s];
[scanner scanUpToString:@">" intoString:NULL];
if (![scanner isAtEnd])
[scanner setScanLocation:[scanner scanLocation]+1];
s = nil;
}
return ms;
}
但仍然包含脚本和 CSS