以下是 HTML:
<div class="ajaxcourseindentfix">
<h3>CPSC 353 - Introduction to Computer Security (3) </h3>
<hr>Security goals, security systems, access controls, networks and security, integrity, cryptography fundamentals, authentication. Attacks: software, network, website; management considerations, security standards in government and industry; security issues in requirements, architecture, design, implementation, testing, operation, maintenance, acquisition, and services.
<br>
<br>Prerequisite: <a href="preview_course_nopop.php?catoid=16&coid=96570" onclick="acalogPopup()">CPSC 253U</a>
<span style="display: none !important"> </span> or <a href="#" onclick="acalogPopup()" target="_blank">CPSC 254</a>
<span style="display: none !important"> </span> and <a href="#" onclick="acalogPopup()" target="_blank">CPSC 351</a>
<span style="display: none !important"> </span>
, declared major/minor in CPSC, CPEN, or CPEI
<br>
</div>
我需要从此 HTML 中获取以下文本:
从第 6 行 -或
从第 7 行 -和
,在 CPSC、CPEN 或 CPEI 中声明为主要/次要
我可以使用以下 XPath 获得 href [课程编号:CPSC 254 等...]:
# This xpath gives me all the tags followed by h3 and then I iterate through them in my script.
//div[@class='ajaxcourseindentfix']/h3/following-sibling::text()[2]/following-sibling::*
更新
然后是带有以下 XPath 的文本:
# This xpath gives me all the text after the h3 tag.
//div[@class='ajaxcourseindentfix']/h3/following-sibling::text()[2]/following-sibling::text()
我需要以与 URL 1相同的方式拥有这些课程名称/先决条件。
在这种方法中,我首先获取所有 HREF,然后是所有文本。有没有更好的方法来实现这一目标?我不想迭代 2 个 XPath 以首先获取 HREF,然后是 Text,然后再将它们组合成先决条件字符串。
1 http://catalog.fullerton.edu/ajax/preview_course.php?catoid=16&coid=99648&show