regex - 如何编写正则表达式以获取 XML 标记内的文本？

Question

我正在尝试编写一个正则表达式，它将返回一些 XML 标记内的文本。例如，如果我有一个这种格式的文件

<name>Joe Blog</name>
<email>abc@sample.com</email>
<address>123 sample st</address>

如何提取地址字段的文本？

对此的任何帮助将不胜感激。谢谢，

score 2 · Accepted Answer

此表达式将捕获地址值

<address>(.*?)<\/address>

在此处输入图像描述

并将其放入第一个捕获组

例子

示例文本

<name>Joe Blog</name>
<email>abc@sample.com</email>
<address>123 sample st</address>

火柴

[0][0] = <address>123 sample st</address>
[0][1] = 123 sample st

然而

大多数语言都有一个 html 解析工具，例如你可以在 PHP 中使用：

$dom = new DOMDocument();
$dom->loadHTML($your_html_here);
$addresses= $dom->getElementsByTagName('address');
foreach($addresses as $address) {
    $address = $address->innertext;
    // do something
}

score 0 · Accepted Answer

您必须自己编写还是可以使用 tinyxml2？

如果在没有 SAX 解析器的情况下使用 tinyxml2 并且您知道文档，请尝试以下操作：

/* ------ Example 2: Lookup information. ---- */    
{
    XMLDocument doc;
    doc.LoadFile( "dream.xml" );

    // Structure of the XML file:
    // - Element "PLAY"      the root Element, which is the 
    //                       FirstChildElement of the Document
    // - - Element "TITLE"   child of the root PLAY Element
    // - - - Text            child of the TITLE Element

    // Navigate to the title, using the convenience function,
    // with a dangerous lack of error checking.
    const char* title = doc.FirstChildElement( "PLAY" )->FirstChildElement( "TITLE" )->GetText();
    printf( "Name of play (1): %s\n", title );

    // Text is just another Node to TinyXML-2. The more
    // general way to get to the XMLText:
    XMLText* textNode = doc.FirstChildElement( "PLAY" )->FirstChildElement( "TITLE" )->FirstChild()->ToText();
    title = textNode->Value();
    printf( "Name of play (2): %s\n", title );
}

如果您想使用 SAX 解析器，tinyxml2 也支持该模式。例如代码，前往 cocos2d-x 并查看 CCSAXParser 类，该类调用和子类 tinyxml2 以解析几乎任何 XML 文件。

来源：tinyXML2 cocos2d-x

regex - 如何编写正则表达式以获取 XML 标记内的文本？

2 回答 2

例子

然而

Related

Reference