0

我似乎无法从以下短语中获取电子邮件地址:

“mailto:fwwrp-3492801490@yahoo.com?”

到目前为止我已经尝试过

regexpr(":([^\\?*]?)", phrase)

代码逻辑如下:

  1. 以分号字符开头:
  2. 获取不是问号的每个字符
  3. 返回括号内的那些字符。

我不确定我的正则表达式哪里出错了。

4

2 回答 2

9

让我们看看你的正则表达式,我们会看看你哪里出错了。我们将把它拆开以便更容易讨论:

:            Just a literal colon, no worries here.
(            Open a capture group.
    [        Open a character class, this will match one character.
        ^    The leading ^ means "negate this class"
        \\   This ends up as a single \ when the regex engine sees it and that will
             escape the next character.
        ?    This has no special meaning inside a character class, sometimes a
             question mark is just a question mark and this is one of those
             times. Escaping a simple character doesn't do anything interesting.
        *    Again, we're in a character class so * has no special meaning.
    ]        Close the character class.
    ?        Zero or one of the preceding pattern.
)            Close the capture group.

去除噪音给了我们:([^?*]?)

所以你的正则表达式实际上匹配:

冒号后跟零个或一个不是问号或星号的字符以及非问号或非星号将位于第一个捕获组中。

这和你想要做的完全不同。一些调整应该可以解决您的问题:

:([^?]*)

那匹配:

后跟任意数量的非问号和非问号的冒号将位于第一个捕获组中。

*字符类外部是特殊的,字符类外部表示“零或多个”,字符类内部只是一个*.

我会把它留给其他人来帮助你处理 R 方面的事情,我只是想让你了解正则表达式发生了什么。

于 2012-12-24T05:22:05.963 回答
3

这是一个非常简单的方法gsub

gsub("([a-z]+:)(.*)([?]$)", "\\2", "mailto:fwwrp-3492801490@yahoo.com?")
## Or, if you expect things other than characters before the colon
gsub("(.*:)(.*)([?]$)", "\\2", "mailto:fwwrp-3492801490@yahoo.com?")
## Or, discarding the first and third groups since they aren't very useful
gsub(".*:(.*)[?]$", "\\1", "mailto:fwwrp-3492801490@yahoo.com?")

从@TylerRinker 开始的地方开始,您还可以使用strsplit如下方式(以避免不得不gsub问号):

strsplit("mailto:fwwrp-3492801490@yahoo.com?", ":|\\?", fixed=FALSE)[[1]][2]

如果你有一个这样的字符串列表呢?

phrase <- c("mailto:fwwrp-3492801490@yahoo.com?", 
            "mailto:somefunk.y-address@Sqmpalm.net?")
phrase
# [1] "mailto:fwwrp-3492801490@yahoo.com?"  
# [2] "mailto:somefunk.y-address@Sqmpalm.net?"

## Using gsub
gsub("(.*:)(.*)([?]$)", "\\2", phrase)
# [1] "fwwrp-3492801490@yahoo.com"     "somefunk.y-address@Sqmpalm.net"

## Using strsplit
sapply(phrase, 
       function(x) strsplit(x, ":|\\?", fixed=FALSE)[[1]][2], 
       USE.NAMES=FALSE)
# [1] "fwwrp-3492801490@yahoo.com"     "somefunk.y-address@Sqmpalm.net"

我更喜欢这种方法的简洁性gsub

于 2012-12-24T05:03:22.477 回答