-1

我有一个文字假设

1) "Project:ABC is located near CBA, being too far from city  "
2) "P r o j e c t : PQR is located near RQP, highlights some greenary"

我想在单词“ project ”和“ , ”之间提取文本,以便我的输出是ABC is located near CBA来自text1的“ PQR is located near RQP”和来自text2的“” ,因为我使用了正则表达式

x="Project:ABC is located near CBA, being too far from city  "
sub(".*Project: *(.*?) *, .*", "\\1", x)
O\P
ABC is located near CBA

但是对于text2)它没有给出正确的输出,所以我如何包含 OR 条件以便满足我的两个条件。任何建议都会有所帮助。谢谢

4

4 回答 4

1

您可以将一些正则表达式与 Lookahead 和 Lookbehind 断言一起使用。

stringr在一个小例子上 使用包

Vec <- c("Project:ABC is located near CBA, being too far from city", 
         "P r o j e c t : PQR is located near RQP, highlights some greenary")
library(stringr)
str_extract(Vec, "(?<=:).*(?=,)")
#> [1] "ABC is located near CBA"  " PQR is located near RQP"

如果您的输入更复杂,则应调整正则表达式,因为它可能不够严格(目前,它介于 first:和 last之间,

于 2017-09-14T11:15:51.687 回答
1

让你的正则表达式更灵活一点:[^:]+:\s*([^,]+),.*

> sub("[^:]+:\\s*([^,]+),.*", "\\1", "P r o j e c t : PQR is located near RQP, highlights some greenary")
[1] "PQR is located near RQP"

> sub("[^:]+:\\s*([^,]+),.*", "\\1", "Project:ABC is located near CBA, being too far from city  ")
[1] "ABC is located near CBA"
于 2017-09-14T11:23:41.250 回答
0

一种选择base Rgsub匹配字符 ( .*) 直到:后跟零个或多个空格 ( \\s*) 或 ( |) a,后跟其他字符并将其替换为空白 ( "")

gsub(".*:\\s*|,.*", "", Vec)
#[1] "ABC is located near CBA" "PQR is located near RQP"

如果我们需要匹配Project后跟:

pat <- paste0(gsub("", "\\\\s*", "Project"), ":\\s*|\\s*,.*")
gsub(pat, "", Vec)
#[1] "ABC is located near CBA" "PQR is located near RQP" "Ganga gnd A3 And 3.."   

数据

Vec <- c("Project:ABC is located near CBA, being too far from city", 
 "P r o j e c t : PQR is located near RQP, highlights some greenary", 
 "Project: Ganga gnd A3 And 3.., Plot Bearing / CTS / Survey / Final Plot No.: Sr No"
 )
于 2017-09-14T11:17:25.973 回答
0

如果Project单词不是问题:

> text
[1] "Project:ABC is located near CBA, being too far from city  "
> substr(text,grep(":",strsplit(text,'')[[1]]),grep(",",strsplit(text,'')[[1]]))
[1] ":ABC is located near CBA,"
> substr(text,grep(":",strsplit(text,'')[[1]])+1,grep(",",strsplit(text,'')[[1]])-1)
[1] "ABC is located near CBA"
> text <- "P r o j e c t : PQR is located near RQP, highlights some greenary"
> substr(text,grep(":",strsplit(text,'')[[1]])+1,grep(",",strsplit(text,'')[[1]])-1)
[1] " PQR is located near RQP"

应该可以正常工作!

于 2017-09-14T11:19:51.620 回答