我得到了一份应该是最新的员工名单,但它与用 ASP.NET 编写的 Intranet People Finder 不匹配。
由于信息很敏感,我无法访问 People Finder 正在使用的数据库,因此我获取信息的唯一方法是从顶部的顶层黄铜开始刮取结构,然后依次遍历每一层。
每个人都有一个员工编号,然后形成 URL http://intranet/peoplefinder/index.aspx?srn=ABC1234
,然后所有向他们报告的人以<a id="gvEmployees_ctl03_lnkFullName" href="index.aspx?srn=ABC4321" target="_self">
每个 URL 指示员工编号并提供指向其团队的链接的格式列在下方。
当团队很大时,问题就出现了,因为分页是在 GridView 中使用 URL 实现的,例如<a href="javascript:__doPostBack('gvEmployees','Page$2')">2</a>
.
我将如何抓取此页面,捕获 SRN 和其他详细信息以及在 GridView 的所有页面上向该人报告的人,然后遍历每个报告人并执行相同的过程,直到整个列表完成?
结果的示例 HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" >
<head><title>
People Finder: Name Surname
</title><link rel="stylesheet" href="/path/to/style.css" type="text/css" /><link rel="stylesheet" href="/path/to/anotherStyle.css" type="text/css" />
<script type="text/javascript" src="/path/to/peoplefinder.js"></script>
</head>
<body>
<form name="form1" method="post" action="/path/to/index.aspx" id="form1">
<div>
<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" />
<input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="" />
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="### ViewState ###" />
</div>
<script type="text/javascript">
<!--
var theForm = document.forms['form1'];
if (!theForm) {
theForm = document.form1;
}
function __doPostBack(eventTarget, eventArgument) {
if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
theForm.__EVENTTARGET.value = eventTarget;
theForm.__EVENTARGUMENT.value = eventArgument;
theForm.submit();
}
}
// -->
</script>
<script src="/path/to/WebResource.axd?d=AueXWrgAf8xSxMTAt1Q4AA2&t=633311832634916698" type="text/javascript"></script>
<div class="HP3CHeader">
<div id="LWHPBanner">
<h1><span id="lblName">Name Surname</span></h1>
</div>
</div>
<div id='CPMain'>
<div id="mainBox">
<div id="pnlEmployeeDetails">
<div id='basicData'>
<img id="imgPhoto" class="photo" src="/path/to/photo.jpg" style="height:69px;width:69px;border-width:0px;" />
<span id="lblBusinessUnit">Business Unit</span>
<span id="lblCostCentreName">Cost Centre</span>
<span id="lblLocation">Location</span>
<a href='/path/to/checkcontactdetails.htm' target='_blank' onclick='return OpenCheckContactDetails();' >Find out how to change your details/photo.</a>
<div id="manager">
<strong>Reports to: </strong><a id="hlManager" href="/path/to/index.aspx?srn=ABC1234">Name Surname</a>
</div>
</div>
<div id='contactData'>
<div id="pnlSrn">
<strong>Staff number:</strong> <span id="lblSrn">ABC1234</span>
</div>
<div id="pnlEmailAddress">
<strong>Email Address:</strong> <span id="lblEmailAddress">Email</span>
</div>
<div style="clear: both"></div>
</div>
</div>
<div id="pnlGrid">
<h3><span id="lblGridTitle">Name's team</span></h3>
<div>
<table class="subordinates" cellspacing="0" cellpadding="2" rules="cols" border="1" id="gvEmployees" style="border-style:None;border-collapse:collapse;">
<tr style="color:Black;background-color:#EFF3FB;border-style:None;font-weight:bold;">
<th scope="col"><a href="javascript:__doPostBack('gvEmployees','Sort$SRN')" style="color:Black;">SRN</a></th><th scope="col"><a href="javascript:__doPostBack('gvEmployees','Sort$FullName')" style="color:Black;">Full name</a></th><th scope="col"><a href="javascript:__doPostBack('gvEmployees','Sort$RACFID')" style="color:Black;">RACFID</a></th>
</tr><tr class="reports" style="background-color:White;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl02_lnkFullName" href="index.aspx?srn=1K5932" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:#EFF3FB;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl03_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:White;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl04_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:#EFF3FB;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl05_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:White;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl06_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:#EFF3FB;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl07_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:White;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl08_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:#EFF3FB;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl09_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:White;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl10_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:#EFF3FB;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl11_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="PagerStyle" style="color:#000039;border-style:None;">
<td colspan="3"><table border="0">
<tr>
<td><span>1</span></td><td><a href="javascript:__doPostBack('gvEmployees','Page$2')" style="color:#000039;">2</a></td>
</tr>
</table></td>
</tr>
</table>
</div>
</div>
</div>
<div id="searchBox">
<strong>Search People Finder:</strong>
<br /><br />
<span>Forename:</span><br/>
<span><input name="txtFirstname" type="text" id="txtFirstname" /></span><br/>
<span>Surname:</span><br/>
<span><input name="txtSurname" type="text" id="txtSurname" /></span><br/>
<span>RACFID:</span><br/>
<span><input name="txtRacfid" type="text" id="txtRacfid" /></span><br/>
<span>Staff number:</span><br/>
<span><input name="txtSrn" type="text" id="txtSrn" /></span><br/>
<div class="searchBoxItem" style="text-align:center;width:100%"><input type="submit" name="btnFind" value="Search" onclick="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions("btnFind", "", false, "", "index.aspx", false, false))" id="btnFind" title="Search for employees member" class="button" style="border-style:Outset;" /></div><br/>
<div>People Finder searches only UK staff.</div>
<!-- <div><a class="execBoardLink" href="/path/to/index.aspx?srn=ABC1234">Show Executive Board</a></div> -->
<div style="margin-top:5px;"><a href="/path/to/phonebook" target="phoneBook" onclick='return OpenPhonebook();' title="Open Phonebook in new window">Open Phonebook</a></div>
</div>
</div>
<div class="contentFooter" style="text-align:center;">
<table width="100%" cellpadding="0" cellspacing="0" border="0" summary="Navigation layout table">
<tr>
<td align="left"><span class="linkArrow"><</span> <a href="javascript:history.back();">Back</a></td>
<td align="center"></td>
<td align="right"><span class="linkArrow">^ </span><a href="#top">Top</a></td>
</tr>
</table>
</div>
<div>
<input type="hidden" name="__PREVIOUSPAGE" id="__PREVIOUSPAGE" value="vy066Txz34y1E515UsTSTDabHKEmdBRCsq7xM0lpJls1" />
<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEWCgKM3uTTAgLP/83pDwLfwaTTAQKNguzjCAKt98LeCwLZh62pDwKKqdGpBwLd2q7jAwKa+5aMBAL5zb65C42zY4GBEUKujhjtZ/hZ8sLESfiF" />
</div></form>
</body>
</html>