xml - R xmlToDataFrame XML TEI

Question

Someonr 附带了一个 XML TEI（文本编码计划）来进行 R traitement ...我不是 XML 专家，也不是 TEI 专家（我不知道它是否格式正确）。我所有的尝试都是不成功......我的一份文件：

<?xml version="1.0" encoding="utf-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title>Luxury Bound</title>
      </titleStmt>
      <publicationStmt>
        <p/>
      </publicationStmt>
      <sourceDesc>
        <msDesc>
          <msIdentifier>
            <country>unknown</country>
            <msName>unknown location (Hours by a follower of Jean Semont)</msName>
          </msIdentifier>
          <msContents>
            <msItemStruct/>
            <msItem>
              <p xml:id="content1">Hours (Tournai)</p>
            </msItem>
          </msContents>
          <physDesc>
            <decoDesc>
              <p>Information on the illustrations : </p>
              <p>Total number of illustrations : </p>
              <p>Number of miniatures : </p>
              <p>Number of historiated initials : </p>
              <p>Number of grisailles : </p>
              <p>Number of drawings : </p>
              <p>
                <listPerson type="miniaturists">
                  <person>
                    <persName>Jean Semont (follower)</persName>
                  </person>
                </listPerson>
              </p>
            </decoDesc>
....

我试过了：

library('XML')
doc<-xmlParse("luxud1.xml")
summary(doc)

$nameCounts

        catDesc        category               p           title         measure             val            date 
             11              11              10               6               4               4               3 
             ab       langUsage        language        origDate        persName             TEI      additional 
              2               2               2               2               2               1               1 
      adminInfo    availability            bibl         binding     bindingDesc          catRef       classDecl 
              1               1               1               1               1               1               1 
        country        decoDesc    encodingDesc          extent        fileDesc              hi         history 
              1               1               1               1               1               1               1 
       listBibl      listPerson      measureGrp      msContents          msDesc    msIdentifier          msItem 
              1               1               1               1               1               1               1 
   msItemStruct          msName            note      objectDesc          origin          person        physDesc 
              1               1               1               1               1               1               1 
      placeName       principal     profileDesc publicationStmt             ref          region      settlement 
              1               1               1               1               1               1               1 
     sourceDesc     supportDesc        taxonomy       teiHeader       textClass       titleStmt 
              1               1               1               1               1               1 

$numNodes
[1] 102

如果我尝试过：

p<-xmlToDataFrame(doc,homogeneous=FALSE, nodes= getNodeSet(doc, "//persName") )

我有一个奇怪的事情......文件所有值的串联......你能给出好方法吗？谢谢你

xml - R xmlToDataFrame XML TEI

0 回答 0

Related

Reference