regex - Perl 正则表达式

Question

<ref id="ch02_ref1"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>J.M.</surname><given-names>Astilleros</given-names></name>

这是单行。我只需要提取标签之间的单词<given-names>，</given-names>在这种情况下是Astilleros。是否有正则表达式来执行此操作。我面临的问题是每个单词和结束标记之间没有空格，</given-names>其中“/”是 perl 正则表达式中的一个字符..请帮助..

想法是找出名称，在页面上的文本中找到它们并<given-names>Astilleros</given-names>在它们周围放置标签。我一定会尝试 XML 解析器。

score 2 · Accepted Answer

不要用正则表达式解析 XML——这太难了。周围有很好的解析器，等待您使用。让我们使用XML::LibXML：

use strict; use warnings;
use XML::LibXML;

my $dom = XML::LibXML->load_xml(string => <<'END');
<ref id="ch02_ref1">
  <mixed-citation publication-type="journal">
    <person-group person-group-type="author">
      <name>
        <surname>J.M.</surname>
        <given-names>Astilleros</given-names>
      </name>
    </person-group>
  </mixed-citation>
</ref>
END

# use XPath to find your element
my ($name) = $dom->findnodes('//given-names');
print $name->textContent, "\n";

（无论您尝试什么，都不要使用 XML::Simple！）

score 0 · Accepted Answer

这应该作为正则表达式工作：

/<given-names>(.*?)</

根据您的输入，它将捕获Astilleros

这匹配：

一个字面<given-names>
捕获（0 到无限次）任何字符（换行符除外）
直到它达到一个字面<

regex - Perl 正则表达式

2 回答 2

Related

Reference