python - 如何使用此页面上的表格使机械化不失败？

Question

import mechanize

url = 'http://steamcommunity.com'

br=mechanize.Browser(factory=mechanize.RobustFactory())

br.open(url)
print br.request
print br.form
for each in br.forms():
    print each
    print

上面的代码导致：

Traceback (most recent call last):
  File "./mech_test.py", line 12, in <module>
    for each in br.forms():
  File "build/bdist.linux-i686/egg/mechanize/_mechanize.py", line 426, in forms
  File "build/bdist.linux-i686/egg/mechanize/_html.py", line 559, in forms
  File "build/bdist.linux-i686/egg/mechanize/_html.py", line 228, in forms
mechanize._html.ParseError

我的具体目标是使用登录表单，但我什至无法机械化来识别有任何表单。即使使用我认为是选择任何 形式的最基本方法，也会br.select_form(nr=0)导致相同的回溯。表单的 enctype 是 multipart/form-data 如果有区别的话。

我想这一切都归结为一个两部分的问题：我怎样才能让机械化来处理这个页面，或者如果不可能，在维护 cookie 的同时还有什么方法？

编辑：如下所述，这将重定向到“ https://steamcommunity.com ”。

Mechanize 可以成功检索 HTML，如下代码所示：

url = 'https://steamcommunity.com'

hh = mechanize.HTTPSHandler()  # you might want HTTPSHandler, too
hh.set_http_debuglevel(1)
opener = mechanize.build_opener(hh)
response = opener.open(url)
contents = response.readlines()

print contents

score 2 · Accepted Answer

使用这个秘密，我相信这对你有用；）

br = mechanize.Browser(factory=mechanize.DefaultFactory(i_want_broken_xhtml_support=True))

score 2 · Accepted Answer

您是否提到该网站正在重定向到 https (ssl) 服务器？

好吧，尝试像这样设置一个新的 HTTPS 处理程序：

mechanize.HTTPSHandler()

python - 如何使用此页面上的表格使机械化不失败？

2 回答 2

Related

Reference