python - python从网站子页面获取数据

Question

我正在尝试创建一个从 Steam 获取市场链接但遇到问题的机器人。我能够从一个页面返回所有数据，但是当我尝试获取多个页面时，它只是给了我第一页的副本，尽管我给了它工作链接（例如：http ://steamcommunity.com/market/search ?q=appid%3A753#p1然后http://steamcommunity.com/market/search?q=appid%3A753#p2）。我已经测试了这些链接，它们可以在我的浏览器中使用。这是我的代码。

import urllib2
import random
import time

start_url = "http://steamcommunity.com/market/search?q=appid%3A753"
end_page = 3
urls = []

def get_raw(url):
    req = urllib2.Request(url)
    response = urllib2.urlopen(req)
    return response.read()

def get_market_urls(html):
    index = 0
    while index != -1:
        index = html.find("market_listing_row_link", index+25)
        beg = html.find("http", index)
        end = html.find('"',beg)
        print html[beg:end]
        urls.append(html[beg:end])

def go_to_page(page):
    return start_url+"#p"+str(page)

def wait(min, max):
    wait_t = random.randint(min,max)
    time.sleep(wait_t)

for i in range(end_page):
    url = go_to_page(i+1)
    raw = get_raw(url)
    get_market_urls(raw)

score 1 · Accepted Answer

您的问题是您误解了 URL 的内容。

主题标签后面的数字并不意味着它是可以获取的不同 URL。这称为查询字符串。在该特定页面中，查询字符串向 javascript 解释了要关闭 AJAX 的页面。（如果您有兴趣，请在此处和此处阅读。）。

无论如何，你应该看看 url: http://steamcommunity.com/market/search/render/?query=appid%3A753&start=00&count=10。您可以使用 start= 00 &count= 10参数来获得您想要的结果。

享受。

python - python从网站子页面获取数据

1 回答 1

Related

Reference