0

我有一些 XML,其中包含很多属性信息,这里是一个小例子。

<?xml version="1.0" encoding="UTF-8"?>
 <collection xmlns="http://www.loc.gov/MARC21/slim">
  <record>
    <leader>04170npc a22003613u 4500</leader>
    <controlfield tag="001">vtls003932502</controlfield>
    <controlfield tag="003">WlAbNL</controlfield>
    <datafield tag="035" ind1=" " ind2=" ">
        <subfield code="a">(WlAbNL)1002</subfield>
    </datafield>
    <datafield tag="040" ind1=" " ind2=" ">
        <subfield code="a">WlAbNL</subfield>
        <subfield code="b">eng</subfield>
        <subfield code="c">WlAbNL</subfield>
    </datafield>
    <datafield tag="245" ind1="0" ind2="0">
        <subfield code="a">Scott Blair Collection,</subfield>
        <subfield code="f">1910 -</subfield>
    </datafield>
    <datafield tag="653" ind1=" " ind2=" ">
        <subfield code="a">rheology</subfield>
    </datafield>
  </record>
  <record>
    <leader>04229npc a22005893u 4500</leader>
    <controlfield tag="001">vtls003932503</controlfield>
    <datafield tag="035" ind1=" " ind2=" ">
        <subfield code="a">(WlAbNL)1004</subfield>
    </datafield>
    <datafield tag="040" ind1=" " ind2=" ">
       <subfield code="a">WlAbNL</subfield>
       <subfield code="b">eng</subfield>
       <subfield code="c">WlAbNL</subfield>
    </datafield>
    <datafield tag="245" ind1="0" ind2="0">
       <subfield code="a">Celtic Collection,</subfield>
       <subfield code="f">17th century -</subfield>
    </datafield>
    <datafield tag="653" ind1=" " ind2=" ">
        <subfield code="a">Scottish Gaelic language</subfield>
    </datafield>
 </record>
</collection>

目前我有一个 php 脚本,它只加载整个文档

$xml = simplexml_load_file("Mapping_coll_wales.xml");
$records = $xml->record;

这将创建一个看起来像这样的记录数组(我已将其缩减为一条记录)

  SimpleXMLElement Object
(
[leader] => 04170npc a22003613u 4500
[controlfield] => Array
    (
        [0] => vtls003932502
        [1] => WlAbNL
    )
 [datafield] => Array
    (
        [0] => SimpleXMLElement Object
            (
                [@attributes] => Array
                    (
                        [tag] => 035
                        [ind1] =>  
                        [ind2] =>  
                    )

                [subfield] => (WlAbNL)1002
            )
        [1] => SimpleXMLElement Object
            (
                [@attributes] => Array
                    (
                        [tag] => 040
                        [ind1] =>  
                        [ind2] =>  
                    )

                [subfield] => Array
                    (
                        [0] => WlAbNL
                        [1] => eng
                        [2] => WlAbNL
                    )

            )

        [2] => SimpleXMLElement Object
            (
                [@attributes] => Array
                    (
                        [tag] => 245
                        [ind1] => 0
                        [ind2] => 0
                    )

                [subfield] => Array
                    (
                        [0] => Scott Blair Collection,
                        [1] => 1910 -
                    )
            )
        [3] => SimpleXMLElement Object
            (
                [@attributes] => Array
                    (
                        [tag] => 653
                        [ind1] =>  
                        [ind2] =>  
                    )

                [subfield] => rheology
            )
    )

)

目前我只是通过假设它在数组中的位置来拉出我需要的字段,并遍历每条记录(大约有 500 个)

for ($i =0; $i <5; $i++) {

echo '<strong>Title</strong> = : ' . $records[$i]->datafield[2]->subfield . '<br />';
echo '<strong>tag</strong>  = :' . $records[$i]->datafield[3]->subfield . '<br />';


echo '<br />------------------------------------------------------------------------<br />';
}

然而,xml 可能包含其他标签,所以我不想依赖它作为索引 2 等的子字段。理想情况下,我希望能够使用类似的东西来调用它

echo '<strong>Title</strong> = : ' . $records[$i]->datafield[245][a] . '<br />';

我确定它相当直截了当,我只是遗漏了一些东西,但是能够将标签加载为数组索引或者有某种方式直接通过其标签获取数据字段并通过其代码获取子字段会很好,因为那不会改变。

希望这是有道理的。

保罗

4

1 回答 1

1

您可以使用 XPath 匹配满足特定条件的元素。

但是,因为您使用的是命名空间节点,所以您必须在希望使用xpath()命名空间路径表达式的每个节点上注册命名空间。

请参见下面的示例,该示例在循环中起作用。

$nsp = 'marc';
$nsuri = 'http://www.loc.gov/MARC21/slim';


$records = $xml->record;


foreach($records as $record) {
    $record->registerXPathNamespace($nsp, $nsuri);
    $datafields = $record->xpath('marc:datafield[@tag=245]');
    foreach ($datafields as $datafield) {
        $datafield->registerXPathNamespace($nsp, $nsuri);
        $subfields = $datafield->xpath('marc:subfield[@code="a"]');
        var_dump($subfields);
    }
}

或者,您可以仅使用 xpath 而不是 simplexml 对象访问来向下递归。这里有两种方法可以得到相同的结果:

$records = $xml->record;
$records->registerXPathNamespace($nsp, $nsuri);

$tags = array('245', '653');
$codes = array('a', 'f');

// METHOD 1: run an xpath for each tag/code combination
$desiredfields = array();
foreach ($tags as $tag) {
    $desiredsubfields = array();
    foreach($codes as $code) {
        $subfields = $records->xpath("marc:datafield[@tag='$tag']/marc:subfield[@code='$code']");
        $desiredsubfields[$code] = (string) $subfields[0];
    }
    $desiredfields[$tag] = $desiredsubfields;
}

var_export($desiredfields);

// METHOD 2: create a single xpath expression that matches every subfield you want
// Then visit each subfield retrieving tag from parent
$tagexpr = implode(' or ', array_map(function($t){return "@tag='{$t}'";}, $tags));
$codeexpr = implode(' or ', array_map(function($c){return "@code='{$c}'";}, $codes));
$xpath = "marc:datafield[{$tagexpr}]/marc:subfield[{$codeexpr}]";

$desiredfields = array();
$subfields = $records->xpath($xpath);

foreach ($subfields as $subfield) {
    $datafield = $subfield->xpath('..');
    $datafieldcode = (string) $datafield[0]['tag'];
    $desiredfields[$datafieldcode][(string) $subfield['code']] = (string) $subfield;
}

var_export($desiredfields);
于 2012-07-05T16:47:19.510 回答