0

我希望它打印每个未列入黑名单的站点(到目前为止代码看起来如何)但是如果您将最后一个 if 语句中的字符串从 pass 更改为 print(site) 然后它会打印黑名单中的所有内容,但它不起作用不会打印所有未列入黑名单的内容,这是我的目标

import requests 
from bs4 import BeautifulSoup
from lxml import html, etree
import sys
import re
import fnmatch
url = ("http://stackoverflow.com")
blacklist = ['*stackoverflow.com*', '*stackexchange.com*']
r = requests.get(url, timeout=6, verify=True)
soup = BeautifulSoup(r.content, 'html.parser')
for link in soup.select('a[href*="http"]'):
    site = (link.get('href'))
    site = str(site)
    for filtering in blacklist:
        if fnmatch.fnmatch(site, filtering):
            pass
        else:
            print(site)
4

1 回答 1

0

你想要这样的东西:

import requests
from bs4 import BeautifulSoup
from lxml import html, etree
import sys
import re
import fnmatch
url = ("http://stackoverflow.com")
blacklist = ['*stackoverflow.com*', '*stackexchange.com*']
r = requests.get(url, timeout=6, verify=True)
soup = BeautifulSoup(r.content, 'html.parser')
for link in soup.select('a[href*="http"]'):
    site = (link.get('href'))
    site = str(site)
    if any([fnmatch.fnmatch(site, filtering) for filtering in blacklist]):
        continue
    print(site)

问题发生在这里(旧代码):

for filtering in blacklist:
        if fnmatch.fnmatch(site, filtering):
            pass
        else:
            print(site)

当您在这里迭代时,如果该网站被列入黑名单,它将匹配一个条件但不匹配另一个条件,因此它将始终被打印。有多种解决方案,我的any()用于检查结果是否为 True 至少一次,如果是,则继续循环并且不要打印:D

于 2021-10-04T09:26:52.863 回答