regex - 为什么这个正则表达式不能正确捕获句点？

Question

我正在尝试学习更多关于 shell 脚本的知识。所以，我有一些文件，其中包含 spamassassin 写入目录的电子邮件，我想我会尝试对它们进行一些正则表达式匹配。所以，我选择需要不同匹配的文件，然后尝试对它们进行排序。

我写了这个脚本：

#!/usr/local/bin/bash
#
regex='(\.)?'
files="/var/spool/spam/testing/out.*"
for i in $files; do
domain=`cat $i | grep -i "Message-ID: <" | cut -d'@' -f2 | cut -d'>' -f1 | cut -d' ' -f1`
echo "Domain is $domain"
echo "We're starting the if loop"
if [ -z "$domain" ];
then
echo "Domain is empty"
echo $i
#rm $i
elif ! [[ "$domain" =~ $regex ]];
then
echo "There are no periods in the domainname $domain"
elif [[ $domain =~ $regex ]];
then
echo "There are periods in the domainname $domain"
fi
done

我想要完成的是分离 Message-ID: 的域部分，然后确定该域是什么。一些消息 ID 根本没有域。有些有假域名。有些有这样的域：yahoo.co.uk。

每条消息都有两个 Message-ID: 条目，因此域名最终会出现两次。

当我在两个文件上运行此脚本时，这是我得到的结果：

# bash /usr/local/bin/rm-bounces.sh 
Domain is xbfoqrka
xbfoqrka
We're starting the if loop
There are periods in the domainname xbfoqrka
xbfoqrka
Domain is SKY-20150201SFT.com
SKY-20150201SFT.com
We're starting the if loop
There are periods in the domainname SKY-20150201SFT.com
SKY-20150201SFT.com

我不明白为什么 xbfoqrka 匹配应该在域名中查找句点的正则表达式，但不匹配在域名中查找 NO 句点的正则表达式。我正在转义句号，所以它应该是完全匹配的，并且 xbfoqrka xbfoqrka 中没有句号。

score 1 · Accepted Answer

该?符号表示零或一。.所以正则表达式在文本中寻找至少零或一。由于没有.，xbfoqrka所以正则表达式找到匹配项（为零）。

请注意，正则表达式将为任意数量的.- 零、一、三、100 等返回 true。这是因为具有 100 个点的字符串至少有零或一个点。

regex - 为什么这个正则表达式不能正确捕获句点？

1 回答 1

Related

Reference