这是我在 Stack 上的第一篇文章,所以如果我没有发布足够的信息,请告诉我。然而,我已经查看了许多其他已回答的问题,并尝试了许多解决方案,结果让我在这里结束了。
我无法从一系列大约 800 个 xml 文件中获取数据。我想要以下数据框。
Model Species PubChemID out of "rdf:li*
Abiotrophia_defectiva_ATCC_49176 M_10fthf__91__c__93__ 122347
Abiotrophia_defectiva_ATCC_49176 M_10m3hddcaACP__91__c__93__ N/A
我可以在 PubChemID 的数据框中清除 URL 的其余部分
从以下 xml 示例。
<?xml version="1.0" encoding="UTF-8"?>
<sbml xmlns="http://www.sbml.org/sbml/level3/version1/core" xmlns:fbc="http://www.sbml.org/sbml/level3/version1/fbc/version2" xmlns:groups="http://www.sbml.org/sbml/level3/version1/groups/version1" level="3" version="1" fbc:required="false" groups:required="false">
<model metaid="Abiotrophia_defectiva_ATCC_49176" id="Abiotrophia_defectiva_ATCC_49176" name="Abiotrophia defectiva ATCC 49176" fbc:strict="true">
<notes>
<body xmlns="http://www.w3.org/1999/xhtml">
<div>
<h1>Abiotrophia_defectiva_ATCC_49176</h1>
<h2>Description</h2>
<p>This is a metabolism reconstruction of Abiotrophia defectiva ATCC 49176</p>1.03<p>Authors: Stefania Magnusdottir, Almut Heinken, Laura Kutt, Dmitry A. Ravcheev, Eugen Bauer, Alberto Noronha, Kacy Greenhalgh, Christian Jaeger, Joanna Baginska, Paul Wilmes, Ronan M.T. Fleming, and Ines Thiele.</p>
<h3>Draft information</h3>
<p>
<ul>
<li> PubSEED ID: Abiotrophia defectiva ATCC 49176 (592010.4)</li>
<li> Draft reconstruction ID: Seed592010_4_124632</li>
<li> Draft platform: ModelSEED</li>
<li> Draft created: 7/1/2014</li>
</ul>
</p>
<p>This work is licensed under a <a href="https://creativecommons.org/licenses/by-nc-nd/4.0/" target="_blank">Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License</a>.</p>
<p>When using this model in your research works, please cite: Magnusdottir et al., Generation of genome-scale metabolic reconstructions for 773 members of the human gut microbiota, Nat Biotechnol, 2016.</p></div>
</body>
</notes>
<annotation>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:vCard4="http://www.w3.org/2006/vcard/ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/">
<rdf:Description rdf:about="#Abiotrophia_defectiva_ATCC_49176">
<bqbiol:is>
<rdf:Bag>
<rdf:li rdf:resource="http://identifiers.org/taxonomy/592010"/>
</rdf:Bag>
</bqbiol:is>
</rdf:Description>
</rdf:RDF>
</annotation>
<listOfUnitDefinitions>
<unitDefinition id="mmol_per_gDW_per_hr">
<listOfUnits>
<unit kind="mole" exponent="1" scale="-3" multiplier="1"/>
<unit kind="gram" exponent="-1" scale="0" multiplier="1"/>
<unit kind="second" exponent="-1" scale="0" multiplier="3600"/>
</listOfUnits>
</unitDefinition>
</listOfUnitDefinitions>
<listOfCompartments>
<compartment metaid="c" id="c" name="Cytoplasm" constant="false"/>
<compartment metaid="e" id="e" name="Extracellular" constant="false"/>
</listOfCompartments>
<listOfSpecies>
<species metaid="M_10fthf__91__c__93__" id="M_10fthf__91__c__93__" name="10-Formyltetrahydrofolate" compartment="c" hasOnlySubstanceUnits="false" boundaryCondition="false" constant="false" fbc:charge="-2" fbc:chemicalFormula="C20H21N7O7">
<annotation xmlns:sbml="http://www.sbml.org/sbml/level3/version1/core">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:vCard4="http://www.w3.org/2006/vcard/ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/">
<rdf:Description rdf:about="#M_10fthf__91__c__93__">
<bqbiol:is>
<rdf:Bag>
<rdf:li rdf:resource="http://identifiers.org/hmdb/HMDB00972"/>
<rdf:li rdf:resource="http://identifiers.org/kegg.compound/C00234"/>
<rdf:li rdf:resource="http://identifiers.org/pubchem.compound/122347"/>
</rdf:Bag>
</bqbiol:is>
</rdf:Description>
</rdf:RDF>
</annotation>
</species>
<species metaid="M_10m3hddcaACP__91__c__93__" id="M_10m3hddcaACP__91__c__93__" name="10-methyl-3-hydroxy-dodecanoyl-ACP" compartment="c" hasOnlySubstanceUnits="false" boundaryCondition="false" constant="false" fbc:charge="-1" fbc:chemicalFormula="C24H45N2O9PRS"/>
<species metaid="M_10m3hundecACP__91__c__93__" id="M_10m3hundecACP__91__c__93__" name="10-methyl-3-hydroxy-undecanoyl-ACP" compartment="c" hasOnlySubstanceUnits="false" boundaryCondition="false" constant="false" fbc:charge="-1" fbc:chemicalFormula="C23H43N2O9PRS"/>
<species metaid="M_10m3oddcaACP__91__c__93__" id="M_10m3oddcaACP__91__c__93__" name="10-methyl-3-oxo-dodecanoyl-ACP" compartment="c" hasOnlySubstanceUnits="false" boundaryCondition="false" constant="false" fbc:charge="-1" fbc:chemicalFormula="C24H43N2O9PRS"/>
</listOfSpecies>
</model>
</sbml>
我已成功转换为 r 中的列表并调用应应用于其他 800 xml 的第一个元素
library(xml2)
list <- xmlToList("StackExample.xml")
list[["model"]][["notes"]][["body"]][["div"]][["h1"]]
我也可以把所有的物种都拿出来,但一些节点包含更多层次结构的事实让我有点困惑。
species.list <- list$model$listOfSpecies
specieslist <- lapply(species.list, '[[', 1)
如何将 if/else 类型的函数添加到“lapply”中,以便在附加层次结构中查找“/rdf:resources”?
最后,我很确定将任何脚本应用于其余文件应该是可行的。
谢谢