2

这是我在 Stack 上的第一篇文章,所以如果我没有发布足够的信息,请告诉我。然而,我已经查看了许多其他已回答的问题,并尝试了许多解决方案,结果让我在这里结束了。

我无法从一系列大约 800 个 xml 文件中获取数据。我想要以下数据框。

Model                                Species                       PubChemID out of "rdf:li*
Abiotrophia_defectiva_ATCC_49176     M_10fthf__91__c__93__         122347
Abiotrophia_defectiva_ATCC_49176     M_10m3hddcaACP__91__c__93__   N/A

我可以在 PubChemID 的数据框中清除 URL 的其余部分

从以下 xml 示例。

<?xml version="1.0" encoding="UTF-8"?>
<sbml xmlns="http://www.sbml.org/sbml/level3/version1/core" xmlns:fbc="http://www.sbml.org/sbml/level3/version1/fbc/version2" xmlns:groups="http://www.sbml.org/sbml/level3/version1/groups/version1" level="3" version="1" fbc:required="false" groups:required="false">
  <model metaid="Abiotrophia_defectiva_ATCC_49176" id="Abiotrophia_defectiva_ATCC_49176" name="Abiotrophia defectiva ATCC 49176" fbc:strict="true">
    <notes>
      <body xmlns="http://www.w3.org/1999/xhtml">
        <div>
          <h1>Abiotrophia_defectiva_ATCC_49176</h1>
          <h2>Description</h2>
          <p>This is a metabolism reconstruction of Abiotrophia defectiva ATCC 49176</p>1.03<p>Authors: Stefania Magnusdottir, Almut Heinken, Laura Kutt, Dmitry A. Ravcheev, Eugen Bauer, Alberto Noronha, Kacy Greenhalgh, Christian Jaeger, Joanna Baginska, Paul Wilmes, Ronan M.T. Fleming, and Ines Thiele.</p>
          <h3>Draft information</h3>
          <p>
            <ul>
              <li> PubSEED ID: Abiotrophia defectiva ATCC 49176 (592010.4)</li>
              <li> Draft reconstruction ID: Seed592010_4_124632</li>
              <li> Draft platform: ModelSEED</li>
              <li> Draft created: 7/1/2014</li>
            </ul>
          </p>
          <p>This work is licensed under a <a href="https://creativecommons.org/licenses/by-nc-nd/4.0/" target="_blank">Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License</a>.</p>
          <p>When using this model in your research works, please cite: Magnusdottir et al., Generation of genome-scale metabolic reconstructions for 773 members of the human gut microbiota, Nat Biotechnol, 2016.</p></div>
        </body>
      </notes>
    <annotation>
      <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:vCard4="http://www.w3.org/2006/vcard/ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/">
        <rdf:Description rdf:about="#Abiotrophia_defectiva_ATCC_49176">
          <bqbiol:is>
            <rdf:Bag>
              <rdf:li rdf:resource="http://identifiers.org/taxonomy/592010"/>
            </rdf:Bag>
          </bqbiol:is>
        </rdf:Description>
      </rdf:RDF>
    </annotation>
    <listOfUnitDefinitions>
      <unitDefinition id="mmol_per_gDW_per_hr">
        <listOfUnits>
          <unit kind="mole" exponent="1" scale="-3" multiplier="1"/>
          <unit kind="gram" exponent="-1" scale="0" multiplier="1"/>
          <unit kind="second" exponent="-1" scale="0" multiplier="3600"/>
        </listOfUnits>
      </unitDefinition>
    </listOfUnitDefinitions>
    <listOfCompartments>
      <compartment metaid="c" id="c" name="Cytoplasm" constant="false"/>
      <compartment metaid="e" id="e" name="Extracellular" constant="false"/>
    </listOfCompartments>
    <listOfSpecies>
      <species metaid="M_10fthf__91__c__93__" id="M_10fthf__91__c__93__" name="10-Formyltetrahydrofolate" compartment="c" hasOnlySubstanceUnits="false" boundaryCondition="false" constant="false" fbc:charge="-2" fbc:chemicalFormula="C20H21N7O7">
        <annotation xmlns:sbml="http://www.sbml.org/sbml/level3/version1/core">
          <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:vCard4="http://www.w3.org/2006/vcard/ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/">
            <rdf:Description rdf:about="#M_10fthf__91__c__93__">
              <bqbiol:is>
                <rdf:Bag>
                  <rdf:li rdf:resource="http://identifiers.org/hmdb/HMDB00972"/>
                  <rdf:li rdf:resource="http://identifiers.org/kegg.compound/C00234"/>
                  <rdf:li rdf:resource="http://identifiers.org/pubchem.compound/122347"/>
                </rdf:Bag>
              </bqbiol:is>
            </rdf:Description>
          </rdf:RDF>
        </annotation>
      </species>
      <species metaid="M_10m3hddcaACP__91__c__93__" id="M_10m3hddcaACP__91__c__93__" name="10-methyl-3-hydroxy-dodecanoyl-ACP" compartment="c" hasOnlySubstanceUnits="false" boundaryCondition="false" constant="false" fbc:charge="-1" fbc:chemicalFormula="C24H45N2O9PRS"/>
      <species metaid="M_10m3hundecACP__91__c__93__" id="M_10m3hundecACP__91__c__93__" name="10-methyl-3-hydroxy-undecanoyl-ACP" compartment="c" hasOnlySubstanceUnits="false" boundaryCondition="false" constant="false" fbc:charge="-1" fbc:chemicalFormula="C23H43N2O9PRS"/>
      <species metaid="M_10m3oddcaACP__91__c__93__" id="M_10m3oddcaACP__91__c__93__" name="10-methyl-3-oxo-dodecanoyl-ACP" compartment="c" hasOnlySubstanceUnits="false" boundaryCondition="false" constant="false" fbc:charge="-1" fbc:chemicalFormula="C24H43N2O9PRS"/>
      </listOfSpecies>
      </model>
</sbml>

我已成功转换为 r 中的列表并调用应应用于其他 800 xml 的第一个元素

library(xml2)
list <- xmlToList("StackExample.xml")
list[["model"]][["notes"]][["body"]][["div"]][["h1"]]

我也可以把所有的物种都拿出来,但一些节点包含更多层次结构的事实让我有点困惑。

species.list <- list$model$listOfSpecies
specieslist <- lapply(species.list, '[[', 1)

如何将 if/else 类型的函数添加到“lapply”中,以便在附加层次结构中查找“/rdf:resources”?

最后,我很确定将任何脚本应用于其余文件应该是可行的。

谢谢

4

0 回答 0