我正在尝试研究如何使用dplyr
and过滤来自大型数据集的一些观察结果grepl
。grepl
如果其他解决方案会更理想,我不喜欢。
拿这个样本df:
df1 <- data.frame(fruit=c("apple", "orange", "xapple", "xorange",
"applexx", "orangexx", "banxana", "appxxle"), group=c("A", "B") )
df1
# fruit group
#1 apple A
#2 orange B
#3 xapple A
#4 xorange B
#5 applexx A
#6 orangexx B
#7 banxana A
#8 appxxle B
我想要:
- 过滤掉那些以“x”开头的案例
- 过滤掉那些以 'xx' 结尾的情况
我已经设法弄清楚如何摆脱包含“x”或“xx”的所有内容,但不是以开头或结尾。以下是如何摆脱内部带有“xx”的所有内容(不仅仅是以结尾):
df1 %>% filter(!grepl("xx",fruit))
# fruit group
#1 apple A
#2 orange B
#3 xapple A
#4 xorange B
#5 banxana A
这显然是“错误地”(从我的角度来看)过滤了“appxxle”。
I have never fully got to grips with regular expressions. I've been trying to modify code such as: grepl("^(?!x).*$", df1$fruit, perl = TRUE)
to try and make it work within the filter command, but am not quite getting it.
Expected output:
# fruit group
#1 apple A
#2 orange B
#3 banxana A
#4 appxxle B
I'd like to do this inside dplyr
if possible.