0

再会,

使用正则表达式获取标签内的所有内容是否有任何替代方法。这是我的代码:

   MatchCollection matches = Regex.Matches(chek, "<bib-parsed>([^\000]*?)</bib-parsed>");

这是示例输入:

   <bib-parsed>
   <cite>
   <pubinfo>
   <pub-year><i>1984</i></pub-year>
   <pub-place>Albuquerque</pub-place>
   <pub-name>Maxwell Museum of Anthropology and the University of New Mexico Press        </pub-name>
   </pubinfo>
   <bkinfo>
   <btl>The Galaz Ruin: A Prehistoric Mimbres Village in Southwestern New Mexico</btl>
   </bkinfo>
   </bib-parsed>

上面的示例将被匹配,但是当 pubyear 中有“0”,如“2001”时,匹配失败。有什么替代方法吗?谢谢

4

1 回答 1

6

It appears your input is valid XML. If this is the case, use the XML parsers in either System.Xml or System.Xml.Linq. They are extremely fast. For an input string containing multiple chunks like your example, using the System.Xml.Linq namespace objects:

var bibChunks = XDocument.Parse(yourXmlString)
                         .Descendants("bib-parsed")
                         .Select(e => e.Value);

foreach(string chunk in bibChunks) {
    // do stuff
}

That's all there is to it.

于 2013-10-18T02:17:57.863 回答