python - 如何测试 django 网站中的外部 url 或链接？

Question

嗨，我正在使用 python 3 在 django 1.8 中构建一个博客网站。在博客中，用户会写博客，有时还会添加外部链接。我想爬取这个博客网站中的所有页面，并测试用户提供的每个外部链接是否有效。

我怎样才能做到这一点？我应该使用python scrapy之类的东西吗？

score 1 · Accepted Answer

import urllib2
import fnmatch

def site_checker(url):

    url_chk = url.split('/')
    if fnmatch.fnmatch(url_chk[0], 'http*'):
        url = url
    else:
        url = 'http://%s' %(url)
    print url

    try:
        response = urllib2.urlopen(url).read()
        if response:
            print 'site is legit'
    except Exception:
    print "not a legit site yo!"

site_checker('google') ## not a complete url
site_checker('http://google.com') ## this works

希望这有效。Urllib 将读取站点的 html，如果它不为空。这是一个合法的网站。否则它不是一个网站。我还添加了一个 url 检查以添加 http:// 如果它不存在。

python - 如何测试 django 网站中的外部 url 或链接？

1 回答 1

Related

Reference