我有一个 SimpleXML 对象,它是通过合并来自 PubMed 的多个 XML(下面的片段)制成的,但是合并中有重复。如何比较所有第一个子数组 - array[][0]、array[][1] 等 - 并丢弃任何重复项?我虽然也许序列化是答案,但你不能序列化 SimpleXML 对象 afaik ..
我不知道从哪里开始?
Array
(
[0] => Array
(
[title] => SimpleXMLElement Object
(
[0] => Superstructure of the centromeric complex of TubZRC plasmid partitioning systems.
)
[link] => SimpleXMLElement Object
(
[@attributes] => Array
(
[Version] => 1
)
[0] => 23010931
)
[author] => Aylett, CH., Löwe, J.
[journal] => SimpleXMLElement Object
(
[0] => Proc. Natl. Acad. Sci. U.S.A.
)
[pubdate] => 2012-9-27
[day] => SimpleXMLElement Object
(
[0] => 25
)
[month] => SimpleXMLElement Object
(
[0] => Sep
)
[year] => SimpleXMLElement Object
(
[0] => 2012
)
)
[1] => Array
(
[title] => SimpleXMLElement Object
(
[0] => Superstructure of the centromeric complex of TubZRC plasmid partitioning systems.
)
[link] => SimpleXMLElement Object
(
[@attributes] => Array
(
[Version] => 1
)
[0] => 23010931
)
[author] => Aylett, CH., Löwe, J.
[journal] => SimpleXMLElement Object
(
[0] => Proc. Natl. Acad. Sci. U.S.A.
)
[pubdate] => 2012-9-27
[day] => SimpleXMLElement Object
(
[0] => 25
)
[month] => SimpleXMLElement Object
(
[0] => Sep
)
[year] => SimpleXMLElement Object
(
[0] => 2012
)
)
或者,它可以在初始 XML 合并阶段完成 - 如果有人可以建议如何修改它以删除重复项,我现在使用下面的代码?
function simplexml_merge (SimpleXMLElement &$xml1, SimpleXMLElement $xml2) {
$dom1 = new DomDocument();
$dom2 = new DomDocument();
$dom1->loadXML($xml1->asXML());
$dom2->loadXML($xml2->asXML());
$xpath = new domXPath($dom2);
$xpathQuery = $xpath->query('/*/*');
for ($i = 0; $i < $xpathQuery->length; $i++) {
$dom1->documentElement->appendChild(
$dom1->importNode($xpathQuery->item($i), true));
}
$xml1 = simplexml_import_dom($dom1);
}
$xml1 = new SimpleXMLElement($search1);
$xml2 = new SimpleXMLElement($search2);
simplexml_merge($xml1, $xml2);
谢谢。
……
为清楚起见 - 这是我要导入 SimpleXML 的 XML 源布局 - 每个 PubmedArticle 都是一个“元素”,我有兴趣比较并确保没有重复 -
<xml...>
<Document>
<PubmedArticle>
<MedlineCitation>
<PMID version="1">xxx</PMID>
...
</MedlineCitation>
...
</PubmedArticle>
<PubmedArticle>
<MedlineCitation>
<PMID version="1">xxx</PMID>
...
</MedlineCitation>
...
</PubmedArticle>
etc
</Document>
</xml>
PMID 节点是唯一的,因此可用于检查重复项。
……
使用来自@Gordon 的链接 - 我知道使用:
//Get my source XML
$xml1 = new SimpleXMLElement($search1);
$xml2 = new SimpleXMLElement($search2);
//Run through $xml1 and build a query based on it's PMIDs
$query = array();
foreach ($xml1->PubmedArticle as $paper) {
$query[] = sprintf('(PMID != %s)',$paper->MedlineCitation->PMID);
}
$query = implode('and', $query);
//Run through $xml2 and get node which don't have PMID matching $xml1
foreach ($xml2->xpath(sprintf('PubmedArticle/MedlineCitation[%s]', $query)) as $paper) {
echo $paper->asXml();
}
但是我仍然有一个问题 - 合并输出。的输出$xml2
首先缺少<PubmedArticle>
每个“匹配”周围的节点。然后我假设我可以使用相同的合并代码(上面)来进行合并。你能为我指出正确的方向吗?