regex - 用于数字计数 R-lang 的 Perl 正则表达式

Question

我正在使用 R 和 New 来正则表达式：我需要一个正则表达式来提取 'statuses_count' 以获得类似 json 的文本。数据被组织成一个数据框，每一行都有文本。样本数据行：

{'lang': u'en', 'profile_background_tile': False, 'statuses_count': 4414, 'description': u'Progessive,interested in the psychology of politics.

结果应该是：4414。

我正在考虑将 str_extract_all 与 perl 选项一起使用，但我不明白如何仅获取 'statuses_count' (?<=statuses_count.:)(something) 后面的数字

作为一个新手，很高兴能理解如何说“抓住 'statusescount' 之后的数字。” 谢谢！

score 3 · Accepted Answer

1) 子。没有软件包的简单解决方案。

sub(".*'statuses_count': (\\d+).*", "\\1", x)
## [1] "4414"

正则表达式的可视化：

.*'statuses_count': (\d+).*

正则表达式可视化

调试演示

2) gsub如果我们知道字符串中没有其他数字（如示例中的情况），则更容易，因为我们可以删除非数字：

gsub("\\D", "", x)
## [1] "4414"

正则表达式的可视化：

\D

正则表达式可视化

调试演示

3）strapply或straplyc这种方法涉及到一个比较简单的正则表达式：

library(gsubfn)
strapplyc(x, "'statuses_count': (\\d+)", simplify = TRUE)
## [1] "4414"

或者如果你想要一个数字输出：

strapply(x, "'statuses_count': (\\d+)", as.numeric, simplify = TRUE)
## [1] 4414

正则表达式的可视化：

'statuses_count': (\d+)

正则表达式可视化

调试演示

注意：：这些都不需要 Perl 正则表达式扩展。普通的正则表达式将起作用。

score 2 · Accepted Answer

在这里，我perl根据帖子的标题使用正则表达式。

 library(stringr)
 str_extract_all(str1, perl("(?<=statuses_count': )\\d+"))[[1]]
#[1] "4414"

可视化

(?<=statuses_count': )\\d+

正则表达式可视化

调试演示

或使用stringi（大数据集更快）

 library(stringi)
  stri_extract_all_regex(str1, "(?<=statuses_count': )\\d+")[[1]]
 #[1] "4414"

数据

str1 <- "{'lang': u'en', 'profile_background_tile': False, 'statuses_count': 4414, 'description': u'Progessive,interested in the psychology of politics."

regex - 用于数字计数 R-lang 的 Perl 正则表达式

2 回答 2

数据

Related

Reference