facebook - R：使用导出的 Facebook .csv 数据中的标签创建列

Question

我正在分析我的 Facebook 页面的帖子，以了解哪种帖子最能吸引人。所以我想用使用的标签创建列。以下是数据导出的示例：

Post              Likes
Blah   #a          10
Blah Blah #b       12
Blah Bleh #a       10
Bleh   #b           9
Bleh Blah #a #b    15

我想创建这个：

Post              Likes   tags
Blah   #a          10      #a
Blah Blah #b       12      #b
Blah Bleh #a       10      #a
Bleh   #b           9      #b
Bleh Blah #a #b    15      #a #b
Bleh #b Blah #a    14      #a #b

这可能吗？我曾想过使用 grep1 检查内部带有“#”的帖子，但我不知道下一步该做什么。

score 2 · Accepted Answer

这似乎有效：

#random data
DF <- data.frame(Post = c("asd wer #a", "dfg #b gg", 
                          "wer #c qwe qweeee #a #b", "asd asd, ioi #a #c"),
                 Likes = c(sample(1:50, 4)), stringsAsFactors = F)

#find tags
Tags <- lapply(DF$Post, function(x) { spl <- unlist(strsplit(x, " ")) ; 
                                      paste(spl[grep("#", spl)], collapse = ",") })

DF$Tags <- Tags

> DF
                     Post Likes     Tags
1              asd wer #a     9       #a
2               dfg #b gg    10       #b
3 wer #c qwe qweeee #a #b    46 #c,#a,#b
4      asd asd, ioi #a #c    31    #a,#c

score 2 · Accepted Answer

例如，您可以使用gregexpr来查找所需的模式并regmatches提取它：

txt = c('Bleh Blah #a #b','Blah Bleh #a')
regmatches(txt,gregexpr('#[a-z]',txt))   ## I assume a tag is # followed by lower letter 
[[1]]
[1] "#a" "#b"

[[2]]
[1] "#a"

使用 alexis 示例，您可以编写如下内容：

DF$tag <- regmatches(DF$Post,gregexpr('#[a-z]',DF$Post)

编辑以防标签类似于#hi（多个字母）：

txt = c('Bleh Blah #hi allo #b','Blah Bleh #a')
regmatches(txt,gregexpr('#[a-z]+',txt))

[1]]
[1] "#hi" "#b" 

[[2]]
[1] "#a"

facebook - R：使用导出的 Facebook .csv 数据中的标签创建列

2 回答 2

Related

Reference