1

我想使用正则表达式通过 SpamAssassin 检测 URL。

我发现以下在我使用的各种方法中效果很好:

http(s)?://([a-zA-Z0-9.])+.[a-zA-Z]{2,3}

但是,这在 SpamAssassin 中不起作用。

如果我尝试使用上述正则表达式的任何相似之处,则会收到以下错误:

[root@~]spamassassin --lint
Aug 13 19:30:25.005 [38721] warn: Having no space between pattern and following word is deprecated at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 14.
Aug 13 19:30:25.005 [38721] warn: Bareword found where operator expected at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 14, near "var"
Aug 13 19:30:25.005 [38721] warn:  (Missing operator before var?)
Aug 13 19:30:25.005 [38721] warn: Misplaced _ in number at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 14.
Aug 13 19:30:25.005 [38721] warn: Bareword found where operator expected at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 14, near "72_active"
Aug 13 19:30:25.005 [38721] warn:  (Missing operator before active?)
Aug 13 19:30:25.005 [38721] warn: Bareword found where operator expected at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 16, near "", ruletype => "body"
Aug 13 19:30:25.005 [38721] warn:  (Missing operator before body?)
Aug 13 19:30:25.005 [38721] warn: String found where operator expected at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 28, near "body"); 
Aug 13 19:30:25.005 [38721] warn:  last;
Aug 13 19:30:25.005 [38721] warn:  }
Aug 13 19:30:25.005 [38721] warn:  }
Aug 13 19:30:25.005 [38721] warn:  
Aug 13 19:30:25.005 [38721] warn:  
Aug 13 19:30:25.005 [38721] warn:  }
Aug 13 19:30:25.005 [38721] warn:  
Aug 13 19:30:25.005 [38721] warn:  if ($scoresptr->{q{FUZZY_ERECT}}) {
Aug 13 19:30:25.005 [38721] warn:  
Aug 13 19:30:25.005 [38721] warn:  foreach my $l (@_) {
Aug 13 19:30:25.005 [38721] warn:  
Aug 13 19:30:25.005 [38721] warn: #line 1 ""
Aug 13 19:30:25.005 [38721] warn:  (Might be a runaway multi-line "" string starting on line 16)
Aug 13 19:30:25.006 [38721] warn: Having no space between pattern and following word is deprecated at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 28.
Aug 13 19:30:25.006 [38721] warn: Misplaced _ in number at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 28.
Aug 13 19:30:25.006 [38721] warn: Bareword found where operator expected at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 28, near "25_replace"
Aug 13 19:30:25.006 [38721] warn:  (Missing operator before replace?)
Aug 13 19:30:25.006 [38721] warn: Bareword found where operator expected at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 30, near "", ruletype => "body"
Aug 13 19:30:25.006 [38721] warn:  (Missing operator before body?)
Aug 13 19:30:25.006 [38721] warn: String found where operator expected at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 42, near "body"); 
Aug 13 19:30:25.006 [38721] warn:  last;
Aug 13 19:30:25.006 [38721] warn:  }
Aug 13 19:30:25.006 [38721] warn:  }
Aug 13 19:30:25.006 [38721] warn:  
Aug 13 19:30:25.006 [38721] warn:  
Aug 13 19:30:25.006 [38721] warn:  }
Aug 13 19:30:25.006 [38721] warn:  
Aug 13 19:30:25.006 [38721] warn:  if ($scoresptr->{q{MORE_SEX}}) {
Aug 13 19:30:25.006 [38721] warn:  
Aug 13 19:30:25.006 [38721] warn:  foreach my $l (@_) {
Aug 13 19:30:25.006 [38721] warn:  
Aug 13 19:30:25.006 [38721] warn: #line 1 ""
Aug 13 19:30:25.006 [38721] warn:  (Might be a runaway multi-line "" string starting on line 30)
Aug 13 19:30:25.006 [38721] warn: Having no space between pattern and following word is deprecated at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 42.
Aug 13 19:30:25.006 [38721] warn: Misplaced _ in number at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 42.
Aug 13 19:30:25.006 [38721] warn: Bareword found where operator expected at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 42, near "20_phrases"
Aug 13 19:30:25.006 [38721] warn:  (Missing operator before phrases?)
Aug 13 19:30:25.007 [38721] warn: Bareword found where operator expected at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 44, near "", ruletype => "body"
Aug 13 19:30:25.007 [38721] warn:  (Missing operator before body?)
Aug 13 19:30:25.007 [38721] warn: String found where operator expected at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 44, at end of line
Aug 13 19:30:25.007 [38721] warn:  (Missing semicolon on previous line?)
Aug 13 19:30:25.007 [38721] warn: rules: failed to compile Mail::SpamAssassin::Plugin::Check::_body_tests_0_3, skipping:
Aug 13 19:30:25.007 [38721] warn:  (Can't find string terminator '"' anywhere before EOF at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 44.)
Aug 13 19:30:25.140 [38721] warn: lint: 1 issues detected, please rerun with debug enabled for more information
4

1 回答 1

1

此正则表达式(https?:\/\/([a-zA-Z0-9_\-]+\.)+(mobi|[a-z]{2,3}))检测常见的 URL。

它不检测具有通用 TLD的 URL 。如果您还需要检测这些,我会将它们添加到mobi-list 中。


对于您的正则表达式:如果您想从字面上检测一个点,则必须转义一个点,以及一些在正则表达式中具有特殊含义的字符,如*, /,?等。

https://regex101.com是一个很好的正则表达式参考和测试站点,还为您提供有用的解释。

于 2015-08-15T13:40:30.757 回答