0

我正在使用 mechanize 执行 bing 搜索,然后我将用漂亮的汤处理结果。我已经用同样的方法成功地执行了谷歌和雅虎搜索,但是当我进行必应搜索时,我得到的只是一个空白页。

我完全困惑为什么会这样,如果有人能对这件事有所了解,将不胜感激。这是我正在使用的代码示例:

from BeautifulSoup import BeautifulSoup
import mechanize
br = mechanize.Browser()
br.set_handle_robots(False)
br.open("http://www.bing.com/search?count=100&q=cheese")
content = br.response()
content = content.read()
soup = BeautifulSoup(content, convertEntities=BeautifulSoup.ALL_ENTITIES)
print soup

结果是打印了一个空白行。

4

2 回答 2

0

您可能得到响应,答案已经在您的浏览器缓存中。尝试更改一些查询字符串,例如将计数减少到 50。

您还可以添加一些调试代码并查看服务器返回的标头:

br.open("http://www.bing.com/search?count=50&q=cheese")
response = br.response()
headers = response.info()
print headers
content = response.read()

编辑:

我已经count=100用 Firefox 和 Opera 浏览器尝试过这个查询,似乎 bing 不喜欢这样的“大”计数。当我减少计数时,它会起作用。所以这不是机械化或其他 Python 库错误,而是您的查询对 bing 有问题。似乎浏览器也可以查询 bing ,count=100但它必须首先用一些较小的计数查询 bing。奇怪的!

于 2010-11-19T12:30:23.750 回答
0

实现此目的的另一种方法是使用requestswithbeautifulsoup

在线IDE中的代码和示例:

from bs4 import BeautifulSoup
import requests, lxml, json

headers = {
    'User-agent':
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}


def get_organic_results():
  html = requests.get('https://www.bing.com/search?q=nfs', headers=headers)
  soup = BeautifulSoup(html.text, 'lxml')

  bing_data = []

  for result in soup.find_all('li', class_='b_algo'):
    title = result.h2.text
    try:
      link = result.h2.a['href']
    except:
      link = None
    displayed_link = result.find('div', class_='b_attribution').text
    try:
      snippet = result.find('div', class_='b_caption').p.text
    except:
      snippet = None

    for inline in soup.find_all('div', class_='b_factrow'):
      try:
        inline_title = inline.a.text
      except:
        inline_title = None
      try:
        inline_link = inline.a['href']
      except:
        inline_link = None

        bing_data.append({
        'title': title,
        'link': link,
        'displayed_link': displayed_link,
        'snippet': snippet,
        'inline': [{'title': inline_title, 'link': inline_link}]
      })

  print(json.dumps(bing_data, indent = 2))

# part of the created json output:
'''
[
  {
    "title": "Need for Speed Video Games - Official EA Site",
    "link": "https://www.ea.com/games/need-for-speed",
    "displayed_link": "https://www.ea.com/games/need-for-speed",
    "snippet": "Need for Speed Forums Buy Now All Games Forums Buy Now Learn More Buy Now Hit the gas and tear up the roads in this legendary action-driving series. Push your supercar to its limits and leave the competition in your rearview or shake off a full-scale police pursuit \u2013 it\u2019s all just a key-turn away.",
    "inline": [
      {
        "title": null,
        "link": null
      }
    ]
  }
]
'''

或者,您可以使用来自 SerpApi的Bing Organic Results API来做同样的事情。这是一个付费 API,可免费试用 5,000 次搜索。

要集成的代码:

from serpapi import GoogleSearch
import os

def get_organic_results():
  params = {
    "api_key": os.getenv('API_KEY'),
    "engine": "bing",
    "q": "nfs most wanted"
  }

  search = GoogleSearch(params)
  results = search.get_dict()

  for result in results['organic_results']:
    title = result['title']
    link = result['link']
    displayed_link = result['displayed_link']
    try:
      snippet = result['snippet']
    except:
      snippet = None
    try:
      inline = result['sitelinks']['inline']
    except:
      inline = None
    print(f'{title}\n{link}\n{displayed_link}\n{snippet}\n{inline}\n')

# part of the output:
'''
Need for Speed: Most Wanted - Car Racing Game - Official ...
https://www.ea.com/games/need-for-speed/need-for-speed-most-wanted
https://www.ea.com/games/need-for-speed/need-for-speed-most-wanted
Jun 01, 2017 · To be Most Wanted, you’ll need to outrun the cops, outdrive your friends, and outsmart your rivals. With a relentless police force gunning to take you down, you’ll need to make split-second decisions. Use the open world to …
[{'title': 'Need for Speed No Limits', 'link': 'https://www.ea.com/games/need-for-speed/need-for-speed-no-limits'}, {'title': 'Buy Now', 'link': 'https://www.ea.com/games/need-for-speed/need-for-speed-heat/buy'}, {'title': 'Need for Speed Undercover', 'link': 'https://www.ea.com/games/need-for-speed/need-for-speed-undercover'}, {'title': 'Need for Speed The Run', 'link': 'https://www.ea.com/games/need-for-speed/need-for-speed-the-run'}, {'title': 'News', 'link': 'https://www.ea.com/games/need-for-speed/need-for-speed-payback/news'}]
'''

免责声明,我为 SerpApi 工作。

于 2021-06-17T18:41:55.893 回答