5

我想知道为什么我使用 gsub 和 stringi 获得两个不同的输出字符串。元字符是否“。” 在 stringi 中不包括新行?stringi 是否读取“逐行”?

顺便说一句,我没有找到用 stringi 执行“正确”替换的任何方法,所以我需要在这里使用 gsub。

string <- "is it normal?\n\nhttp://www.20minutes.fr"

> gsub(" .*?http"," http", string)
[1] "is http://www.20minutes.fr"

> stri_replace_all_regex(string, " .*?http"," http")
[1] "is it normal?\n\nhttp://www.20minutes.fr"
4

2 回答 2

2

一种方法是设置.也匹配行终止符而不是停在一行:

stri_replace_all_regex(string, " .*?http"," http", 
                       opts_regex = stri_opts_regex(dotall = TRUE))
于 2015-04-15T11:03:33.890 回答
2

By default -- for historical reasons, see this tutorial -- in most regex engines a dot doesn't match a newline character. As @lukeA suggested, to match a newline you may set dotall option to TRUE in stringi regex-based functions.

By the way, gsub(..., perl=TRUE) gives results consistent with stringi.

于 2015-05-01T19:11:18.167 回答