我试图在我的 Ruby 应用程序中使用以下正则表达式代码来匹配 HTTP 链接,但它会生成无效的输出,在链接后面附加一个句点,有时是句点和一个单词,当在网络上测试时,它会变得无效。
URL_PATTERN = Regexp.new %r{http://[\w/.%-]+}i
<input>.to_s.scan( URL_PATTERN ).uniq
上面扫描链接的代码有问题吗?
来自应用程序的代码:
require 'bundler/setup'
require 'twitter'
RECORD_LIMIT = 100
URL_PATTERN = Regexp.new %r{http://[\w/.%-]+}i
def usage
warn "Usage: ruby #{File.basename $0} <hashtag>"
exit 64
end
# Ensure that the hashtag has a hash symbol. This makes the leading '#'
# optional, which avoids the need to quote or escape it on the command line.
def format_hashtag(hashtag)
(hashtag.scan(/^#/).empty?) ? "##{hashtag}" : hashtag
end
# Return a sorted list of unique URLs found in the list of tweets.
def uniq_urls(tweets)
tweets.map(&:text).grep( %r{http://}i ).to_s.scan( URL_PATTERN ).uniq
end
def search(hashtag)
Twitter.search(hashtag, rpp: RECORD_LIMIT, result_type: 'recent')
end
if __FILE__ == $0 usage unless ARGV.size >= 1
hashtag = format_hashtag(ARGV[0])
tweets = search(hashtag)
puts uniq_urls(tweets)
end