python - 使用 Python 进行实时链接检查无法从文件中运行

Question

我写了一个脚本来检查一个链接是否在一个网站上，在这种情况下是“twitter.com”

我可以欣赏我这样做的方式可能不是最好的，但我对 Python 和一般编程很陌生。

无论如何，我试图从一个链接文件中运行它，因此一个 URL 的原始输入将被取消，我将从一个文件中运行多个 URL 检查，以查看它们是否包含“twitter.com”

这是我的代码，工作但使用 raw_input()

    from bs4 import BeautifulSoup

import requests

link_list = []

status = ' Live!!'

domain = 'twitter.com'

url = raw_input("Enter a website to extract the URL's from: ")


r  = requests.get('http://www.' +url)

data = r.text

soup = BeautifulSoup(data)

for link in soup.find_all('a'):
    links = (link.get('href'))
    link_list.append(links)


if domain in ', '.join(link_list):
    print url +status

只是为了澄清我有一个 URLS 文件，逐行，我想检查它们是否包含“twitter.com”

我尝试了很多方法，但它就是行不通！！

任何帮助深表感谢。

score 1 · Accepted Answer

如果要打开文件并将行读入数组，很简单：

with open(filename) as f:
    urls = f.readlines()

之后，urls将是一个名称列表。

然后你可以遍历这个列表：

for url in urls:
    link_list = []
    r  = requests.get('http://www.' +url)
    data = r.text
    soup = BeautifulSoup(data)

    for link in soup.find_all('a'):
        links = (link.get('href'))
        link_list.append(links)

    if domain in ', '.join(link_list):
        print url +status

python - 使用 Python 进行实时链接检查无法从文件中运行

1 回答 1

Related

Reference