我已经修补了一个脚本,该脚本通过导入的 url 列表运行,并从具有“holder”类的 html 部分中获取所有“p”标签。它可以工作,但它只查看导入的 CSV 中的第一个 url:
import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup
contents = []
with open('list.csv','r') as csvf: # Open file in read mode
urls = csv.reader(csvf)
for url in urls:
contents.append(url) # Add each url to list contents
for url in contents: # Parse through each url in the list.
page = urlopen(url[0]).read()
soup = BeautifulSoup(page, "lxml")
n = 0
for container in soup.find_all("section",attrs={'class': 'holder'}):
n += 1
print('==','Section',n,'==')
for paragraph in container.find_all("p"):
print(paragraph)
任何想法我如何让它遍历每个 url 而不仅仅是一个?