1

提交搜索表单后,我需要在网站上进行一些抓取。问题是当我通过浏览器执行此操作时,页面不会重新加载,也不会在任何地方重定向:结果显示在搜索表单下方,链接没有任何更改,尽管我可以在“新”页面中看到它们html。但是当我使用以下代码时,我看不到应该在响应中的“新”页面 html(提供的链接是我实际尝试使用的链接):

import mechanicalsoup

def fetchfile(query):

    url = "http://www.italgiure.giustizia.it/sncass/"

    browser = mechanicalsoup.Browser()
    page = browser.get(url)
    search_form = page.soup.find("form", {"id": "z-form"})
    search_form.find("input", {"id":"searchterm"})["value"] = query
    response = browser.submit(search_form, page.url)

    print(response) # the response is 200, so it should be a good sign

    # actual parsing will come later...
    print("1235" in response.text) # quick-check to see if there is what I'm looking for, but I get False

    # in fact this...
    print(page.text == response.text) # ...gives me True

fetchfile("1235/2012")

我不明白我错过了什么。我宁愿不使用硒。有什么线索吗?

4

1 回答 1

0

我刚刚解决了同样的问题。我对 Python 也很陌生,所以让我尝试解释一下。

您正在“查找”页面上的元素,但您需要从表单搜索中获取结果并将其转换为 Form 对象,然后您可以设置表单对象的值并提交它。提交后您没有得到任何回报的原因是因为您的表单值实际上都没有设置,您只是在进行搜索。我知道这个问题很老,但希望这对其他人也有帮助。我不知道“查询”的实际值应该是什么,所以我无法验证它是否有效,但在我的程序中这是我使用的方法。

import mechanicalsoup
import html5lib
from bs4 import BeautifulSoup

def fetchfile(query):

    url = "http://www.italgiure.giustizia.it/sncass/"

    browser = mechanicalsoup.Browser()
    page = browser.get(url)

    # Using page.find() with the appropriate attributes is also useful
    # for forms without names
    FORM = mechanicalsoup.Form(page.find('form', attrs={'id': 'z-form'}))

    FORM["searchterm"] = query

    # You can verify the form values are set by doing this:
    print("Form values: ", vars(FORM))

    response = browser.submit(FORM, url)

    print(response) # the response is 200, so it should be a good sign
    Results = browser.get_current_page()
    print("Results: ", Results)

    # actual parsing will come later...
    # quick-check to see if there is what I'm looking for, but I get False
    # print("1235" in response.text) 

    # in fact this...
    print(page.text == response.text) # ...gives me True

# fetchfile("1235/2012")
于 2017-09-02T09:11:16.980 回答