1

我正在尝试构建一个简单的网络爬虫,如果搜索的键是“军团”,它会提供显示在 amazon.in 上的每个军团产品的 URL。我正在使用以下代码:

import requests
from bs4 import BeautifulSoup

def legion_spider(max_pages):
    page = 1
    while page <= max_pages:
       url = 'https://www.amazon.in/s?k=legion&qid=1588862016&swrs=82DF79C1243AF6D61651CCAA9F883EC4&ref=sr_pg_'+ str(page)
       source_code = requests.get(url)
       plain_txt = source_code.text
       soup = BeautifulSoup(plain_txt)
       for link in soup.findAll('a',{'class': 'a-size-medium a-color-base a-text-normal'}):
           href = link.get('href')
           print(href)
       page += 1


legion_spider(1)

我得到的输出是这样的:

C:\Users\lenovo\AppData\Local\Programs\Python\Python38-32\python.exe "E:/Python Practice/web_crawler.py"
E:/Python Practice/web_crawler.py:10: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 10 of the file E:/Python Practice/web_crawler.py. To get rid of this warning, pass the additional argument 'features="html.parser"' to the BeautifulSoup constructor.

  soup = BeautifulSoup(plain_txt)

Process finished with exit code 0
4

1 回答 1

0

您缺少解析器!遵循BS 文档的这一部分

 BeautifulSoup(markup, <parser>)
于 2020-05-08T16:35:51.030 回答