3

This must be a dumb question, but I'm a bit stuck:

I have the an XML file which you can see a sample here:

<?xml version="1.0" encoding="utf-16"?>
<!DOCTYPE tmx SYSTEM "56.dtd">
<body>
<tu changedate="20130625T175037Z"">
  <tuv xml:lang="pt-pt">
    <prop type="x-context-pre">&lt;seg&gt;Some text.&lt;/seg&gt;</prop>
    <prop type="x-context-post">&lt;seg&gt;Other text.&lt;/seg&gt;</prop>
    <seg>The text I'm interested.</seg>
  </tuv>
  <tuv xml:lang="it">
    <seg>And it's translation in italian.</seg>
  </tuv>
 </tu> 

 .... followed by other <tu>'s
</body>

Since it's a huge file I'm using XML::Twig to parse it and get the parts I'm interested in. I'm particulary interested in seg's node content aswell as the tu's node attribute.

Here's the code I've got so far:

use 5.010;

use strict;
use warnings;

use XML::Twig;



my $filename = 'filename.tmx';
my $out_filename = 'out.xml';
open my $out, '>', $out_filename;
binmode $out;

my $original_twig = new XML::Twig (pretty_print => 'nsgmls', twig_handlers => {tu =>   \&original_tu});
$original_twig->parsefile($filename);




sub original_tu {
    my($twig, $original_tu) = @_;
    my $original_seg = $original_tu-> first_child('./tuv/seg')->text;

}

Perl (or should I say XML::Twig) tells me that I've got: wrong navigation condition './tuv/seg' ()

Does anyone know how to access the seg node's text and , if you're not fed up of me already, how to access the changedate atribute of the tu's node?

Thank you very much.

Dasen

4

3 回答 3

2

Here is one way to access that node and attribute:

my $original_seg = $original_tu->first_child('tuv')->first_child('seg')->text;
my $date = $original_tu->att('changedate');
于 2013-08-06T13:52:22.847 回答
1

You can't use a complete XPath expression with first_child, just a single XPath step (ie you can only go down 1 level).

To use an XPath expression you need to use findnodes: my $original_seg = $original_tu->findnodes('./tuv/seg', 0)->text (the ,0 gets the first element of the (potential) list of hits.

To access an attribute, use $original_tu->att( 'date')

于 2013-08-06T14:00:40.300 回答
0

The condition used in first_child cannot use XPath. See https://metacpan.org/module/XML::Twig#cond for details. The method would have been misnamed if it did - first_child returns a child, but seg is a grandchild of tu.

You can use first_descendant('seg') instead.

To access the attribute, use the $original_tu->att('changedate') method.

于 2013-08-06T13:58:03.117 回答