有一个 python 库 - Newspaper3k,它使获取网页内容变得更容易。[报纸][1]
标题检索:
import newspaper
a = Article(url)
print(a.title)
对于内容检索:
url = 'http://fox13now.com/2013/12/30/new-year-new-laws-obamacare-pot-guns-and-drones/'
article = Article(url)
article.text
我想获取有关网页的信息(有时是标题,有时是实际内容)有我的代码来获取网页的内容/文本:
from newspaper import Article
import nltk
nltk.download('punkt')
fil=open("laborURLsml2.csv","r")
# 3, below read every line in fil
Lines = fil.readlines()
for line in Lines:
print(line)
article = Article(line)
article.download()
article.html
article.parse()
print("[[[[[")
print(article.text)
print("]]]]]")
“laborURLsml2.csv”文件内容为:[ laborURLsml2.csv ][2]
我的问题是:我的代码读取了第一个 URL 并打印了内容,但未能读取 2 个 URL