Find centralized, trusted content and collaborate around the technologies you use most.
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
我想获取网页的全文,不幸的是我的刮刀也在捕获 css 代码,我如何完成下面的代码以删除 css 样式代码:
page = " ".join(response.xpath('//body//descendant-or-self::*[not(self::script)]/text()').extract())
尝试
//body//descendant-or-self::*[not(self::script or self::style)]
我测试过,它可以工作,它不包括 STYLE 和 SCRIPT 标签