我对“ scrapy ”很陌生,我正在废弃一个网站,并且我有一些锚标签,其中包含带有java 脚本 SubmitForm 函数的 href 属性。当我单击该 javascript 函数时,一个页面正在打开,我需要从中获取数据。我使用Xpath并找到了特定锚标记的 href,但无法执行包含 javascript 函数的 href 属性。谁能告诉我如何在scrapy python中执行javascript提交锚标签的功能。我的HTML代码是
<table class="Tbl" cellspacing="2" cellpadding="0" border="0">
<tbody>
<tr>
<td class="TblOddRow">
<table cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr>
<td valign="middle" nowrap="">
<a class="Page" alt="Click to view job description" title="Click to view job description" href="javascript:sysSubmitForm('frmSR1');">Accountant </a>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
蜘蛛代码是
class MountSinaiSpider(BaseSpider):
name = "mountsinai"
allowed_domains = ["mountsinaicss.igreentree.com"]
start_urls = [
"https://mountsinaicss.igreentree.com/css_external/CSSPage_SearchAndBrowseJobs.ASP?T=20120517011617&",
]
def parse(self, response):
return [FormRequest.from_response(response,
formdata={ "Type":"CSS","SRCH":"Search Jobs","InitURL":"CSSPage_SearchAndBrowseJobs.ASP","RetColsQS":"Requisition.Key¤Requisition.JobTitle¤Requisition.fk_Code_Full_Part¤[Requisition.fk_Code_Full_Part]OLD.Description(sysfk_Code_Full_PartDesc)¤Requisition.fk_Code_Location¤[Requisition.fk_Code_Location]OLD.Description(sysfk_Code_LocationDesc)¤Requisition.fk_Code_Dept¤[Requisition.fk_Code_Dept]OLD.Description(sysfk_Code_DeptDesc)¤Requisition.Req¤","RetColsGR":"Requisition.Key¤Requisition.JobTitle¤Requisition.fk_Code_Full_Part¤[Requisition.fk_Code_Full_Part]OLD.Description(sysfk_Code_Full_PartDesc)¤Requisition.fk_Code_Location¤[Requisition.fk_Code_Location]OLD.Description(sysfk_Code_LocationDesc)¤Requisition.fk_Code_Dept¤[Requisition.fk_Code_Dept]OLD.Description(sysfk_Code_DeptDesc)¤Requisition.Req¤","ResultSort":"" },
callback=self.parse_main_list)]
def parse_main_list(self, response):
hxs = HtmlXPathSelector(response)
firstpage_urls = hxs.select("//table[@class='Tbl']/tr/td/table/tr/td")
for link in firstpage_urls:
hrefs = link.select('a/@href').extract()