0
<item xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="itemWithRetweets" link="http://twitter.com/MEDClementz/statuses/1001775473305817090" id="1001775473305817090">

如何从上面仅获取链接和 id ^

所需的输出:

       link                                                         
[1] http://twitter.com/MEDClementz/statuses/1001775473305817090    
           id
[1] 1001775473305817090
4

2 回答 2

2

最好使用xml解析器而不是使用正则表达式

library(xml2)
x <- read_xml('<item xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="itemWithRetweets" link="http://twitter.com/MEDClementz/statuses/1001775473305817090" id="1001775473305817090"></item>')

xml_attr(x,"link")
xml_attr(x,"id")

结果:

> xml_attr(x,"link")
[1] "http://twitter.com/MEDClementz/statuses/1001775473305817090"
> xml_attr(x,"id")
[1] "1001775473305817090"
于 2018-05-31T18:37:51.913 回答
0

这是使用该stringr软件包的选项。

library(stringr)

# Create the example string
string <- '<item xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="itemWithRetweets" link="http://twitter.com/MEDClementz/statuses/1001775473305817090" id="1001775473305817090">'

# Split the string
string2 <- str_split(string, pattern = " ")[[1]]

# Get the link
link <- str_subset(string2, "link")
link2 <- str_extract(link, "http://.*[0-9]+")
link2
# [1] "http://twitter.com/MEDClementz/statuses/1001775473305817090"

# Get the id
id <- str_subset(string2, "id")
id2 <- str_extract(id, "[0-9]+")
id2
# [1] "1001775473305817090"
于 2018-05-31T18:42:43.677 回答