0

Sorry, there is not enough documentation for parsing xml-tei in R yet, especially for a beginner in R.

I am counting several nodes with function 'getNodeSet' which have only one different value in 'contains'. The aim is to count '@type=verb' according to specific 'contains' which has all in commun 'contains(@ana,'#action')'. Examples:

#different value "@ana, '#displacement'" 
nodes=getNodeSet(doc,"//ns:w[contains(@type,'verb') and contains(@ana,'#action') and contains(@ana, '#displacement') and contains(@ana, '#ANT')]", ns)  
VARIABLE NAME01 <- length(nodes)
VARIABLE NAME01
#result in the console [2] 

#different value "@ana, '#put_together'" 
nodes=getNodeSet(doc,"//ns:w[contains(@type,'verb') and contains(@ana,'#action') and contains(@ana, '#put_together') and  contains(@ana, '#ANT')]", ns)
VARIABLE NAME02 <- length(nodes) 
VARIABLE NAME02
#result in the console [0]

#different value "@ana, '#destruction'" 
nodes=getNodeSet(doc,"//ns:w[contains(@type,'verb') and contains(@ana,'#action') and contains(@ana, '#destruction') and contains(@ana, '#ANT')]", ns)
VARIABLE NAME03 <- length(nodes)
VARIABLE NAME03
#result in the console [7]

But it is of course very tedious to write each time basically the same things and it's not very beautiful.

Is it possible to have something like (sorry, not propertly encoded, just an example to follow my need):

#a condition
For node=getNodeSet(doc,"//ns:w[contains(@type,'verb') and contains(@ana,'#action') and not(contains(@ana, '#ANT'))]" 
#add in contains
(    
(@ana, ‘DIFFERENT VALUE01') FOR VARIABLE01
(@ana, ‘DIFFERENT VALUE02') FOR VARIABLE02
(@ana, ‘DIFFERENT VALUE03') FOR VARIABLE03
)
#etc.

Do you have an idea?

After, I need to be able to add the result:

add_result <- sum(VARIABLE NAME01, VARIABLE NAME02, VARIABLE NAME03)
add_result

But then, I was thinking of:

nodes=sum(
 (getNodeSet(doc,"//ns:w[contains(@ana,'#action') and contains(@type,'verb')]", ns)),
 (getNodeSet(doc,"//ns:w[contains(@type,'verb') and contains(@type,'verb')]", ns))
 )
add_result <- length(nodes) 
add_result

and then I look for the other node with a different value. But sadly, it doesn't work.

In advance, thanks for your suggestions.

4

2 回答 2

0

到目前为止我做了什么:

nodes=getNodeSet(doc,"//ns:w[contains(@type,'verb') and contains(@ana,'#action') and contains(@ana, '#displacement') and contains(@ana, '#ANT')]", ns)
nodes=getNodeSet(doc,"//ns:w[contains(@type,'verb') and contains(@ana,'#action') and contains(@ana, '#put_together') and  contains(@ana, '#ANT')]", ns)
VARIABLE NAME01 <- length(nodes)
VARIABLE NAME02 <- length(nodes)

不知道有没有更简单的方法。

于 2017-03-13T13:05:12.533 回答
0

一位同事,“R”专家,帮助我并提出:

typeAction=c("'#displacement'","'#put_together'","'#agression'","'#confrontation'","'#movement'","'#otherAction'")
total_action_ANT=0
for (i in 1:length(typeAction)) total_action_ANT=total_action_ANT+length(getNodeSet(doc,paste0("//ns:w[contains(@type,'verb') and contains(@ana,'#action') and contains(@ana, ",typeAction[i],") and contains(@ana, '#ANT')]"), ns))
total_action_ANT

nodelist=list()
for (i in 1:length(typeAction))nodelist[[i]]=getNodeSet(doc,paste0("//ns:w[contains(@type,'verb') and contains(@ana,'#action') and contains(@ana, ",typeAction[i],") and contains(@ana, '#ANT')]"), ns)
str(nodelist)
resultats = cbind(action=typeAction,occurences=unlist(lapply(nodelist,function(x)length(x))))
resultats

效果很好!希望这会有所帮助。

于 2017-03-15T10:25:11.283 回答