0

我有一个复杂的长 XHTML 文件,其中包含 CSS。在谷歌和这个网站上搜索,我发现了一些对 XHTML 解析有用的库:

  • NSXML解析器
  • TBXML
  • 还有一些其他的

但是,我想知道是否有任何 iPhone 库可以将 xhtml + css 文档转换为NSAttributedString(当然只有文本)。

我一直在思考这个问题,也有过一些想法,但我觉得效率不会很高。我的主要思想是由以下步骤形成的:

  • 在 XTHML 文件上检测所有带有idorclass属性的标签,并获取它们有效的字符串范围(我无法实现这一点)。
  • 将所有 CSS 属性保存在 a 上NSDictionary,其中包含更多NSDictionary对象。像这样的东西:

    mainDict {
        object: dictionary {
             object: @"#00ff00"
             key: @"color"
             object: @"1em"
             key: @"font-size"
        }
        key: @"a id"
        object: anotherDictionary {
            ...
        }
        key: @"another id"
    }
    
  • 在属性字典上转换这些 CSS 属性NSAttributedString字典。

我知道这很复杂,我不需要你提供代码(当然,如果你提供它会很棒),我只想要一个库的链接,或者,如果它不存在,一些关于自己创建解析器的建议。

当然,如果您需要更多信息,请通过评论询问。

谢谢!!

4

2 回答 2

2

It depends on your needs if this will do what you want, but DTCoreText has an HTML -> NSAttributedString converter. It's very specific for what DTCoreText wants to / needs to do, but it might at least point you in the right direction.

于 2012-06-03T10:36:19.853 回答
1

我将 HTML 字符串解析为 NSAttributedString 的方法是将解析后的节点(及其子节点)递归地追加到 NSMutableAttributedString 中。

我还没有准备好在任何地方发布我的完整代码。但希望这可以给你一些提示......

NSString+HTML.h

/*  - toHTMLElements
 *  parse the string itself into a dictionary collection of htmlelements for following keys
 *  : @"attributedString"   // html main body
 *  : @"insets"         // images and/or videos with range info
 *  : @"as"             // href with range info
 *  
 */

- (NSMutableDictionary*) toHTMLElements;

NSString+HTML.m

- (NSMutableDictionary*) toHTMLElements {

    // …
    // handle escape encoding here
    // assume that NSString* htmlString is the processed string;
    // …


    NSMutableDictionary * htmlElements = [[NSMutableDictionary dictionary] retain];

    NSMutableAttributedString * attributedString = [[[NSMutableAttributedString alloc] init] autorelease];
    NSMutableArray * insets = [NSMutableArray array];
    NSMutableArray * as     = [NSMutableArray array];

    [htmlElements setObject:attributedString forKey:HTML_ATTRIBUTEDSTRING];
    [htmlElements setObject:insets forKey:HTML_INSETS];
    [htmlElements setObject:as forKey:HTML_AS];


    // parse the HTML with an XML parser
    // CXXML is a variance of TBXML (http://www.tbxml.co.uk/ ) which can handle the inline tags such as <span>
    // code not available to public yet, so write your own inline-tag-enabled HTML/XML parser.

    CXXML * xml = [CXXML tbxmlWithXMLString:htmlString];
    TBXMLElement * root = xml.rootXMLElement;

    TBXMLElement * next = root->firstChild;

    while (next != nil) {
        //
        // do something here for special treatments if needed
        //
        NSString * tagName = [CXXML elementName:next];

        [self appendXMLElement:next withAttributes:[HTMLElementAttributes defaultAttributesFor:tagName] toHTMLElements:htmlElements];

        next = next->nextSibling;
    }

    return [htmlElements autorelease];
}

- (void) appendXMLElement:(TBXMLElement*)aElement withAttributes:(NSDictionary*)parentAttributes toHTMLElements:(NSMutableDictionary*) htmlElements {

    // do your parse of aElement and its attribute values, 
    // assume NSString * tagAttrString is the parsed html attribute string (either from "style" attribute or css file) for this tag like : width:200px; color:#123456; 
    // let an external HTMLElementAttributes class to handle the attribute updates from the parent node's attributes

    NSDictionary * tagAttr = [HTMLElementAttributes updateAttributes: parentAttributes withCSSAttributes:tagAttrString];

    // create your NSAttributedString styled by tagAttr
    // create insets such as images / videos or hyper links objects
    // then update the htmlElements for storage

    // once this tag is handled, recursively visit and process the current tag's children

    TBXMLElement * nextChild = aElement->firstChild;

    while (nextChild != nil) {
        [self appendXMLElement:nextChild withAttributes:tagAttr toHTMLElements:htmlElements];
        nextChild = nextChild->nextSibling;
    }
}
于 2012-06-04T10:14:58.873 回答