0

我正在制作一个网络解析器,一些 href 让我发疯

resp = urllib.request.urlopen("http://portogruaro.trasparenza-valutazione-merito.it/storico-atti")
page = resp.read().decode('utf-8')
print(page)

我在下载的页面中找到了这个:

<a.. href="http://portogruaro.trasparenza-valutazione-merito.it/storico-atti;jsessionid=BE0A764D125947680F3DC6F85760302A?p_p_id=ConsultazioneAtti_WAR_maggioliportalmasterdetailportlet&p_p_lifecycle=2&p_p_state=normal&p_p_mode=view&p_p_resource_id=downloadAllegato&p_p_cacheability=cacheLevelPage&p_p_col_id=column-1&p_p_col_count=1&_ConsultazioneAtti_WAR_maggioliportalmasterdetailportlet_downloadTicket=oMrkWCwhyKWGcD67RyUPTMNzDbwk8ufAwUFVQ2_3Z4045lXXp1gcrKnaH7my84lD0jmgn_na5l1a5KnBtXxYtJYH7rbRP4GRdD53nB0MaBJSV6Ub1JDNoMnspbc2nmqr7a3ucdsOOBOUc4q0uTPd1Dg5ba1VE8DJ1kpf6C0eliencVxLYM8jPqxcSVokmrAjHqkHg4K3CFGZP9tGpCBTPQ"><i class="icon-download"></i> Allegato</a>

您可以看到使用浏览器检索相同 url 的同一锚中的 href 是:

"http://portogruaro.trasparenza-valutazione-merito.it/storico-atti?p_p_id=ConsultazioneAtti_WAR_maggioliportalmasterdetailportlet&p_p_lifecycle=2&p_p_state=normal&p_p_mode=view&p_p_resource_id=downloadAllegato&p_p_cacheability=cacheLevelPage&p_p_col_id=column-1&p_p_col_count=1&_ConsultazioneAtti_WAR_maggioliportalmasterdetailportlet_downloadTicket=HAxoH6d7h0JNRoKoi9sl4R-tsWdtMVoLeeZ8dU5rUQL74MQNMpCnqmBwxX4uNCXuMk4Clb6EzvrIaUXNY0G4q9YGlmebpMDTrR3255v6bLGOiIWVwvbnKiaOoapsGBqwP4JPIUN1R9G8ajAnurCaqTknyMJkVLiKaw0Z4wI61pgAzqjSGHatViGIGIXkrV7IN6EduMl29vAARMvaHhEJ5g"

;jsessionid 被添加是因为机器人不管理 cookie,但这不是唯一的变化......为什么?

编辑:也许特定数量的会话会触发特定操作?

如果您下载网页,则单击下载的 href 将不起作用,但单击您在浏览器页面中看到的 href (view-source:link) 将起作用。

4

1 回答 1

0

;jsessionid 被添加是因为机器人不管理 cookie,但这不是唯一的变化......为什么?

嗯……除了票号和jsessionidtoken之外,都是同一个URL。

参数的顺序不同但据我所知,这并没有改变任何事情。相比:

_ConsultazioneAtti_WAR_maggioliportalmasterdetailportlet_downloadTicket=oMrkWCwhyKWGcD67RyUPTMNzDbwk8ufAwUFVQ2_3Z4045lXXp1gcrKnaH7my84lD0jmgn_na5l1a5KnBtXxYtJYH7rbRP4GRdD53nB0MaBJSV6Ub1JDNoMnspbc2nmqr7a3ucdsOOBOUc4q0uTPd1Dg5ba1VE8DJ1kpf6C0eliencVxLYM8jPqxcSVokmrAjHqkHg4K3CFGZP9tGpCBTPQ
p_p_cacheability=cacheLevelPage
p_p_col_count=1
p_p_col_id=column-1
p_p_id=ConsultazioneAtti_WAR_maggioliportalmasterdetailportlet
p_p_lifecycle=2
p_p_mode=view
p_p_resource_id=downloadAllegato
p_p_state=normal

_ConsultazioneAtti_WAR_maggioliportalmasterdetailportlet_downloadTicket=HAxoH6d7h0JNRoKoi9sl4R-tsWdtMVoLeeZ8dU5rUQL74MQNMpCnqmBwxX4uNCXuMk4Clb6EzvrIaUXNY0G4q9YGlmebpMDTrR3255v6bLGOiIWVwvbnKiaOoapsGBqwP4JPIUN1R9G8ajAnurCaqTknyMJkVLiKaw0Z4wI61pgAzqjSGHatViGIGIXkrV7IN6EduMl29vAARMvaHhEJ5g"
p_p_cacheability=cacheLevelPage
p_p_col_count=1
p_p_col_id=column-1
p_p_id=ConsultazioneAtti_WAR_maggioliportalmasterdetailportlet
p_p_lifecycle=2
p_p_mode=view
p_p_resource_id=downloadAllegato
p_p_state=normal
于 2014-08-22T13:07:59.287 回答