I get a strange encoding problem when I try to parse a certain attribute of an xml/html document. Here a reproducible example , containing 2 items with 2 titles (note the use of french accent here)
library(XML)
doc <- htmlParse('<note>
<item title="é">1</item>
<item title="ï">3</item>
</note>',asText=TRUE,encoding='UTF-8')
Now using xpathApply
, I can read my items like this. Note that special accents are well formatted here.
xpathApply(doc,'//item')
[[1]]
<item title="é">1</item>
[[2]]
<item title="ï">3</item>
But When I try to read my attribute title , I get this :
xpathApply(doc,'//item',xmlGetAttr,'title')
[[1]]
[1] "é"
[[2]]
[1] "ï"
I tried other xpath versions like :
xpathApply(doc,'//item/@title')
xmlAttrs(xpathApply(doc,'//item')[[1]])
But this doesn't work. Any help please?