1

该脚本在本地运行时运行良好,但是当我将其上传到 Heroku 时,requests.get()函数无法打开返回<Response 503>的 Amazon 链接,并且脚本以错误“ AttributeError: 'NoneType' object has没有属性'get_text' ”(我认为这是因为requests.get()未能打开亚马逊链接)。我该如何解决问题,以便requests.get()在 Heroku 上返回<Response 200> ?

import requests
from bs4 import BeautifulSoup
import time

url = "here is written the Amazon product link"

while True:
    req = requests.get(url, headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:79.0) Gecko/20100101 Firefox/79.0'})
    soup = BeautifulSoup(req.content, "lxml")
    name = soup.find(id = "productTitle", class_ = "a-size-large product-title-word-break")
    price = soup.find(id = "priceblock_ourprice", class_ = "a-size-medium a-color-price priceBlockBuyingPriceString")
    name = name.get_text()
    price = price.get_text()
    print "New product: ", name, price
    time.sleep(10)

我的需求文件:

bs4==0.0.1
requests==2.22.0
lxml==4.5.2

我的 Procfile 文件:

web: python "test.py"

我用其他链接尝试了requests.get()并且它有效,所以我认为亚马逊存在一些问题。

4

1 回答 1

0

我试图编写一个将谷歌搜索结果作为链接的代码

searchinput = "dogs"
print('Searching...')
    
google_search = requests.get('https://www.google.com/search?q='+searchinput)
soup = bs4.BeautifulSoup(google_search.text , 'html.parser')

search_results = soup.select('div#main > div > div > div > a')
#for each in search_results:
   #print (each)
   #print("-----------------------")
#print (len(search_results))
linksList = []
......

它继续进行字符串操作。我从请求中遇到了同样的问题。首先,您可以将我的代码用于请求和 beatifulSoup。也许您可能像我一样对 request.get() 有疑问,有时更改格式会有所帮助。此外,在对同一网站的太多请求后,我也遇到了同样的问题。我猜连接会阻止你这样做。您可以重新启动路由器。那对我有用。

请写下您的问题是否通过这些方法之一解决

于 2020-08-11T14:42:10.587 回答