0

我想使用 NSRegularExpression 从 xml 获取标签之间的数据

这是xml

<?xml version="1.0" encoding="UTF-8"?>
<document xmlns="@link" xmlns:xsi="@link" xsi:schemaLocation="@link" version="1.0">
<field left="493" top="670" right="1550" bottom="760" type="text">
<value encoding="utf-16">JENNIFER mml</value>
<line left="493" top="670" right="1550" bottom="733">
<char left="493" top="670" right="549" bottom="733" confidence="69">J</char>
<char left="565" top="670" right="605" bottom="718" confidence="71" suspicious="true">E</char>
<char left="623" top="670" right="660" bottom="718" confidence="76">N</char>
<char left="678" top="670" right="720" bottom="722" confidence="56">N</char>
<char left="736" top="674" right="776" bottom="730" confidence="80">I</char>
<char left="804" top="674" right="841" bottom="729" confidence="74">F</char>
<char left="858" top="670" right="902" bottom="725" confidence="80">E</char>
<char left="922" top="670" right="964" bottom="730" confidence="86">R</char>
<char left="965" top="670" right="1442" bottom="730" confidence="100" />
<char left="1443" top="685" right="1495" bottom="720" confidence="2" suspicious="true">m</char>
<char left="1492" top="685" right="1534" bottom="719" confidence="11" suspicious="true">m</char>
<char left="1544" top="685" right="1550" bottom="718" confidence="100" suspicious="true">l</char>
</line>
</field>
</document>

我想在值标签之间提取这些数据

<value encoding="utf-16">JENNIFER mml</value>

这是ios代码

 NSString *xml =@"<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"yes\"?><document xmlns=\"@link\" xmlns:xsi=\"@link\" xsi:schemaLocation=\"@link\" version=\"1.0\"><field left=\"493\" top=\"670\" right=\"1550\" bottom=\"760\" type=\"text\"><value encoding=\"utf-16\">JENNIFER mml</value><line left=\"493\" top=\"670\" right=\"1550\" bottom=\"733\"><char left=\"493\" top=\"670\" right=\"549\" bottom=\"733\" confidence=\"69\">J</char><char left=\"565\" top=\"670\" right=\"605\" bottom=\"718\" confidence=\"71\" suspicious=\"true\">E</char><char left=\"623\" top=\"670\" right=\"660\" bottom=\"718\" confidence=\"76\">N</char><char left=\"678\" top=\"670\" right=\"720\" bottom=\"722\" confidence=\"56\">N</char><char left=\"736\" top=\"674\" right=\"776\" bottom=\"730\" confidence=\"80\">I</char><char left=\"804\" top=\"674\" right=\"841\" bottom=\"729\" confidence=\"74\">F</char><char left=\"858\" top=\"670\" right=\"902\" bottom=\"725\" confidence=\"80\">E</char><char left=\"922\" top=\"670\" right=\"964\" bottom=\"730\" confidence=\"86\">R</char><char left=\"965\" top=\"670\" right=\"1442\" bottom=\"730\" confidence=\"100\"> </char><char left=\"1443\" top=\"685\" right=\"1495\" bottom=\"720\" confidence=\"2\" suspicious=\"true\">m</char><char left=\"1492\" top=\"685\" right=\"1534\" bottom=\"719\" confidence=\"11\" suspicious=\"true\">m</char><char left=\"1544\" top=\"685\" right=\"1550\" bottom=\"718\" confidence=\"100\" suspicious=\"true\">l</char></line></field></document>";
NSString *pattern = @"<value>(\\d+)</value>";

NSRegularExpression *regex = [NSRegularExpression
                              regularExpressionWithPattern:pattern
                              options:NSRegularExpressionCaseInsensitive
                              error:nil];
NSTextCheckingResult *textCheckingResult = [regex firstMatchInString:xml options:0 range:NSMakeRange(0, xml.length)];

NSRange matchRange = [textCheckingResult rangeAtIndex:1];
NSString *match = [xml substringWithRange:matchRange];
NSLog(@"Found string '%@'", match);
4

1 回答 1

1

您当前的正则表达式只匹配一个精确的<value>标签一个带有\d+.

<value>(\d+)</value>

但是,您的输入具有属性 ( encoding="utf-16")并且不包含作为值 ( JENNIFER mml) 的数字:

<value encoding="utf-16">JENNIFER mml</value>

为了克服第一个问题,您可以将属性硬编码到正则表达式中,或者稍微修改模式:

<value encoding="utf-16">
or
<value[^>]*>

为了匹配标签的值,因为它似乎是按字母顺序排列的(带有空格),我们也会输入数字,您可以使用:

[a-zA-Z0-9\s]+

因此,您完全可以尝试:

<value[^>]*>([a-zA-Z0-9\s]+)</value>

使用您当前的代码(用于复制+粘贴):

NSString *pattern = @"<value[^>]*>([a-zA-Z0-9\\s]+)</value>";

UPDATE ( anything can match between <value></value>)每条评论,标签
之间的确切文本可以包含任何字符,而不仅仅是字母数字。为了解决这个问题,我们可以将所有内容与:<value></value>(.*)

<value>[^>]*>(.*)</value>

使用您当前的代码:

NSString *pattern = @"<value[^>]*>(.*)</value>";
于 2012-11-08T07:01:05.517 回答