在下面的示例中,我可以很好地下载这两个文档(订单和发票)。但是,我无法确定哪个文件是订单,哪个是发票。如何使用 HtmlAgility 循环并找到“查看发票”,然后获取下一个 URL 链接,以便分配文档。作为发票。
示例 HTML 页面:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title></title>
</head>
<body>
<table><tr><td><a href="http://www.weburl.com"> <img id="Image1" src="http://www.weburl.com/App_Themes/Granite/Images/Logo.jpg" style="border-width:0px;" /></a></td><td> <img src="http://www.weburl.com/images/ALT/Priority.gif" style="border-width:0px;" /></td></tr></table></div><br /><div align="center"><center><table border="0"><tr><td width="100%"><div align="center"><table border="0" cellpadding="0" cellspacing="0"><tr><td colspan="2" align="center" nowrap>Documents on file for order# <b>285006797</b><br /> </td></tr>
<tr><td align="right"><font color="#800000"><b>View ORDER: </font></b></td><td><A HREF="xrecord.asp?s=RK=SOFTWARE\Pegasus Transtech\WebDirect;US=WEBPRI;PW=WEBPRI&docinfo=13409811"target="_blank"> Page 1</A>
<tr><td align="right"><font color="#800000"><b>View INVOICE: </font></b></td><td><A HREF="xrecord.asp?s=RK=SOFTWARE\Pegasus Transtech\WebDirect;US=WEBPRI;PW=WEBPRI&docinfo=13496712"target="_blank"> Page 1</A>
<tr><td align="center" colspan="2"><br />Click the page number to view the documents.<br /><hr></td></tr>
</table></div></td></tr></table>
</body>
</html>