我的代码仅从网页正文返回文本。我正在尝试class="menu"
从此页面正文中删除项目中的文本:
<div id="pre-header-links-inner" class="header-links"><ul id="menu-top-bar" class="menu"><li id="menu-item-22" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-22"><a href="tel:000-000-0000">Main Line: +1 000-000-0000</a></li>
<li id="menu-item-23" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-23"><a href="tel:100000000000">Sales: tel:000-000-0000</a></li>
<li id="menu-item-24" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-24"><a href="mailto:info@example.com">Email: info@example.com</a></li>
</ul></div>
</div>
</div>
</div>
<!-- #pre-header -->
<div id="header">
<div id="header-core">
<div id="logo">
<a href="https://www.example.com/" class="custom-logo-link" rel="home" itemprop="url"><img width="253" height="50" src="https://www.example.com/logo.png" class="custom-logo" alt="Domain" itemprop="logo" /></a> </div>
<div id="header-links" class="main-navigation">
<div id="header-links-inner" class="header-links">
<ul id="menu-main-navigation" class="menu"><li id="menu-item-71" class="menu-item menu-item-type-post_type menu-item-object-page menu-item-home current-menu-item page_item page-item-2 current_page_item"><a href="https://www.example.com/"><span>Home</span></a></li>
<li id="menu-item-70" class="menu-item menu-item-type-post_type menu-item-object-page"><a href="https://www.example.com"><span>About Us</span></a></li>
<li id="menu-item-108" class="menu-item menu-item-type-post_type menu-item-object-page"><a href="https://www.example.com/services/"><span>Services</span></a></li>
<li id="menu-item-124" class="menu-item menu-item-type-post_type menu-item-object-page"><a href="https://www.example.com/api/"><span>API</span></a></li>
<li id="menu-item-68" class="menu-item menu-item-type-post_type menu-item-object-page"><a href="https://www.example.com/contact-us/"><span>Contact Us</span></a></li>
</ul>
</div>
</div>
<!-- #header-links .main-navigation -->
<div id="header-nav"><a class="btn-navbar" data-toggle="collapse" data-target=".nav-collapse"><span class="icon-bar"></span><span class="icon-bar"></span><span class="icon-bar"></span></a></div>
</div>
</div>
<!-- #header -->
<div id="header-responsive"><div id="header-responsive-inner" class="responsive-links nav-collapse collapse"><ul id="menu-main-navigation-1" class=""><li id="res-menu-item-71" class="menu-item menu-item-type-post_type menu-item-object-page menu-item-home current-menu-item page_item page-item-2 current_page_item"><a href="https://example.com/"><span>Home</span></a></li>
<li id="res-menu-item-70" class="menu-item menu-item-type-post_type menu-item-object-page"><a href="https://www.example.com/about-us/"><span>About Us</span></a></li>
<li id="res-menu-item-108" class="menu-item menu-item-type-post_type menu-item-object-page"><a href="https://www.example.com/services/"><span>Services</span></a></li>
<li id="res-menu-item-124" class="menu-item menu-item-type-post_type menu-item-object-page"><a href="https://www.example.com/api/"><span>API</span></a></li>
<li id="res-menu-item-68" class="menu-item menu-item-type-post_type menu-item-object-page"><a href="https://www.example.com/contact-us/"><span>Contact Us</span></a></li>
</ul></div></div>
<div id="header-sticky">
<div id="header-sticky-core">
<div id="logo-sticky">
<a href="https://www.example.com/" class="custom-logo-link" rel="home" itemprop="url"><img width="253" height="50" src="https://www.example.com/logo.png" class="custom-logo" alt="Logo" itemprop="logo" /></a> </div>
<div id="header-sticky-links" class="main-navigation">
<div id="header-sticky-links-inner" class="header-links">
<ul id="menu-main-navigation-2" class="menu"><li id="menu-item-71" class="menu-item menu-item-type-post_type menu-item-object-page menu-item-home current-menu-item page_item page-item-2 current_page_item"><a href="https://www.example.com/"><span>Home</span></a></li>
<li id="menu-item-70" class="menu-item menu-item-type-post_type menu-item-object-page"><a href="https://www.example.com/about-us/"><span>About Us</span></a></li>
<li id="menu-item-108" class="menu-item menu-item-type-post_type menu-item-object-page"><a href="https://www.example.com/services/"><span>Services</span></a></li>
<li id="menu-item-124" class="menu-item menu-item-type-post_type menu-item-object-page"><a href="https://www.example.com/api/"><span>API</span></a></li>
<li id="menu-item-68" class="menu-item menu-item-type-post_type menu-item-object-page"><a href="https://www.example.com/contact-us/"><span>Contact Us</span></a></li>
</ul>
奇怪的是 - 当我调用以下行时:
text = "".join(tree.xpath("//body//*[not(@class='menu')]//text()")).strip()
它按原样返回整个纯文本源代码(即,即使是class="text"
元素中的文本)。
但是,当我删除not
关键字时:
text = "".join(tree.xpath("//body//*[(@class='menu')]//text()")).strip()
...它正确地从class="text"
元素中识别文本并完美地隔离它们的文本:
Main Line: +000-000-0000
Sales: +1 000-000-0000
Email: info@example.com
Home
About Us
Services
API
Contact Us
Home
About Us
Services
API
Contact Us
我做错了什么?我希望它从除了class='menu'
.