python - 使用 Python 自动化网页搜索

Question

我想通过访问网站并反复搜索来自动化我一直在做的事情。特别是我去过这个网站，在底部附近向下滚动，单击“即将到来”选项卡，并搜索各个城市。

我是 Python 的新手，我希望能够只键入要输入的城市列表进行搜索，并获得汇总所有搜索结果的输出。因此，例如，以下功能会很棒：

cities = ['NEW YORK, NY', 'LOS ANGELES, CA']
print getLocations(cities)

它会打印

Palm Canyon Theatre PALM SPRINGS, CA    01/22/2016  02/07/2016
...

依此类推，列出每个输入的城市周围 100 英里半径范围内的所有搜索结果。

我试过查看requestsApache2 的模块文档，然后我跑了

r = requests.get('http://www.tamswitmark.com/shows/anything-goes-beaumont-1987/')
r.content

它打印了网页的所有 HTML，所以这听起来像是一个小小的胜利，尽管我不确定如何处理它。

非常感谢您的帮助，谢谢。

score 1 · Accepted Answer

你有两个问题合二为一，所以这里有一个部分答案让你开始。第一个任务涉及 HTML 解析，所以让我们使用 python 库：requests 和 beautifulsoup4（pip install beautifulsoup4，以防你还没有）。

import requests
from bs4 import BeautifulSoup

r = requests.get('http://www.tamswithmark.com/shows/anything-goes-beaumont-1987/')
soup = BeautifulSoup(r.content, 'html.parser')
rows = soup.findAll('tr', {"class": "upcoming_performance"})

汤是页面内容的可导航数据结构。我们在soup 上使用findAll 方法来提取类'upcoming_performance' 的'tr' 元素。行中的单个元素如下所示：

print(rows[0])  # debug statement to examine the content
"""
<tr class="upcoming_performance" data-lat="47.6007" data-lng="-120.655" data-zip="98826">
<td class="table-margin"></td>
<td class="performance_organization">Leavenworth Summer Theater</td>
<td class="performance_city-state">LEAVENWORTH, WA</td>
<td class="performance_date-from">07/15/2015</td>
<td class="performance_date_to">08/28/2015</td>
<td class="table-margin"></td>
</tr>
"""

现在，让我们将这些行中的数据提取到我们自己的数据结构中。对于每一行，我们将为该性能创建一个字典。

每个 tr 元素的 data-* 属性可通过字典键查找获得。

可以使用 .children（或 .contents）属性访问每个 tr 元素内的 'td' 元素。

performances = []  # list of dicts, one per performance
for tr in rows:
    # extract the data-* using dictionary key lookup on tr 
    p = dict(
        lat=float(tr['data-lat']),
        lng=float(tr['data-lng']),
        zipcode=tr['data-zip']
    )
    # extract the td children into a list called tds
    tds = [child for child in tr.children if child != "\n"]
    # the class of each td indicates what type of content it holds
    for td in tds:
       key = td['class'][0] # get first element of class list
       p[key] = td.string  # get the string inside the td tag
    # add to our list of performances
    performances.append(p)

在这一点上，我们有一个表演词典列表。每个字典中的键是：

纬度：浮动

液化天然气：浮动

邮编：str

性能城市状态：str

性能组织：str

ETC

HTML 提取完成。您的下一步是使用地图 API 服务，该服务将您所需位置的距离与表演中的纬度/经度值进行比较。例如，您可以选择使用 Google Maps 地理编码 API。SO上有很多现有的已回答问题可以指导您。

python - 使用 Python 自动化网页搜索

1 回答 1

Related

Reference