Someonr 附带了一个 XML TEI(文本编码计划)来进行 R traitement ...我不是 XML 专家,也不是 TEI 专家(我不知道它是否格式正确)。我所有的尝试都是不成功......我的一份文件:
<?xml version="1.0" encoding="utf-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader>
<fileDesc>
<titleStmt>
<title>Luxury Bound</title>
</titleStmt>
<publicationStmt>
<p/>
</publicationStmt>
<sourceDesc>
<msDesc>
<msIdentifier>
<country>unknown</country>
<msName>unknown location (Hours by a follower of Jean Semont)</msName>
</msIdentifier>
<msContents>
<msItemStruct/>
<msItem>
<p xml:id="content1">Hours (Tournai)</p>
</msItem>
</msContents>
<physDesc>
<decoDesc>
<p>Information on the illustrations : </p>
<p>Total number of illustrations : </p>
<p>Number of miniatures : </p>
<p>Number of historiated initials : </p>
<p>Number of grisailles : </p>
<p>Number of drawings : </p>
<p>
<listPerson type="miniaturists">
<person>
<persName>Jean Semont (follower)</persName>
</person>
</listPerson>
</p>
</decoDesc>
....
我试过了 :
library('XML')
doc<-xmlParse("luxud1.xml")
summary(doc)
$nameCounts
catDesc category p title measure val date
11 11 10 6 4 4 3
ab langUsage language origDate persName TEI additional
2 2 2 2 2 1 1
adminInfo availability bibl binding bindingDesc catRef classDecl
1 1 1 1 1 1 1
country decoDesc encodingDesc extent fileDesc hi history
1 1 1 1 1 1 1
listBibl listPerson measureGrp msContents msDesc msIdentifier msItem
1 1 1 1 1 1 1
msItemStruct msName note objectDesc origin person physDesc
1 1 1 1 1 1 1
placeName principal profileDesc publicationStmt ref region settlement
1 1 1 1 1 1 1
sourceDesc supportDesc taxonomy teiHeader textClass titleStmt
1 1 1 1 1 1
$numNodes
[1] 102
如果我尝试过:
p<-xmlToDataFrame(doc,homogeneous=FALSE, nodes= getNodeSet(doc, "//persName") )
我有一个奇怪的事情......文件所有值的串联......你能给出好方法吗?谢谢你