4

我正在尝试使用 bash 中的正则表达式匹配一些电子邮件地址。目前得到了表达

"^[a-zA-Z0-9!#\$%&'\*\+/=?^_\`{|}~-]+(\.[a-zA-Z0-9!#\$%&'\*\+/=?^_\`{|}~-]+)*@([a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?\.)+[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?\$"

哪个成功匹配了我需要的所有电子邮件,但是当尝试添加“收件人:”字段时,我似乎无法获得任何匹配项,我不知道为什么。这是我的 To 字段代码。

"^To:\s[a-zA-Z0-9!#\$%&'\*\+/=?^_\`{|}~-]+(\.[a-zA-Z0-9!#\$%&'\*\+/=?^_\`{|}~-]+)*@([a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?\.)+[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?\$"

哪个 AFAIK 应该与“收件人:bob@bob.co.uk”匹配,但不匹配:( 有什么建议吗?

代码示例

Reply-To: "service@paypal.com" <service@paypal.com>
To: bob@bob.co.uk
Date: Mon, 21 Jun 2012 21:34:10 -0300

用于搜索文件并添加到数组的代码

regex="^[a-zA-Z0-9!#\$%&'\*\+/=?^_\`{|}~-]+(\.[a-zA-Z0-9!#\$%&'\*\+/=?^_\`{|}~-]+)*@([a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?\.)+[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?\$"


for i in $(cat mailbox.mbx); do 
    if [[ $i =~ $regex ]]; then
    echo $i
    sortarray[$index]=$i
    index=$(($index+1))
    fi
done
4

2 回答 2

3

bash 正则表达式不理解 perl-ish \s。你必须使用 posix-ish [[:space:]]。你也应该在那里添加一个量词

我看到你有锚$regex:那些绊倒你吗?

对于像这样的大量正则表达式,我喜欢零碎地构建它们:

char='[[:alnum:]!#\$%&'\''\*\+/=?^_\`{|}~-]'
name_part="${char}+(\.${char}+)*"
domain="([[:alnum:]]([[:alnum:]-]*[[:alnum:]])?\.)+[[:alnum:]]([[:alnum:]-]*[[:alnum:]])?"
begin='(^|[[:space:]])'
end='($|[[:space:]])'

# include capturing parentheses, 
# these are the ** 2nd ** set of parentheses (there's a pair in $begin)
re_email="${begin}(${name_part}@${domain})${end}"

line="To: joe.smith@example.com"

[[ $line =~ $re_email ]] && echo ${BASH_REMATCH[2]}
# prints: joe.smith@example.com

当然,电子邮件地址非常复杂——http: //www.w3.org/Protocols/rfc822/#z8——几乎任何地方都应该允许评论和空格。其实(hi there) "My First Name".lastname (another comment) @ domain.(really)invalid应该算是一个有效的地址。有一个 Perl 模块Email::Address生成这个正则表达式:

$ perl -MEmail::Address -E 'say $Email::Address::addr_spec'  
(?-xism:(?-xism:(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*(?-xism:[^\x00-\x1F\x7F()<>\[\]:;@\\,."\s]+(?:\.[^\x00-\x1F\x7F()<>\[\]:;@\\,."\s]+)*)(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*"(?-xism:(?-xism:[^\\"])|(?-xism:\\(?-xism:[^\x0A\x0D])))+"(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*))\@(?-xism:(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*(?-xism:[^\x00-\x1F\x7F()<>\[\]:;@\\,."\s]+(?:\.[^\x00-\x1F\x7F()<>\[\]:;@\\,."\s]+)*)(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*\[(?:\s*(?-xism:(?-xism:[^\[\]\\])|(?-xism:\\(?-xism:[^\x0A\x0D]))))*\s*\](?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*)))
于 2013-01-05T13:52:42.127 回答
1

此正则表达式应匹配所需的字符串:

"^To: (.+@.+)$"

电子邮件存储在$1

于 2013-01-05T12:04:30.460 回答