链接在其他<script>
标签内编码。首先我们找到<script>
带有链接的标签,然后将标签的内容加载为其他 BeautifulSoup 对象:
import requests
from bs4 import BeautifulSoup
url = 'https://www.cnoocltd.com/col/col32091/index.html'
soup = BeautifulSoup(requests.get(url).text, 'html.parser')
soup2 = BeautifulSoup( soup.select_one('.Introduction script').text, 'lxml' )
for tag in soup2.select('[href]'):
print('{: <40}{}'.format( tag['href'], tag.text) )
印刷:
/art/2019/7/10/art_32091_15297761.html CNOOC China Limited Signs Cooperation Framework Agreement with Sinopec Corp.
/art/2019/6/7/art_32091_15297108.html CNOOC Limited entered into a Share Purchase Agreement for the Acquisition of 10% equity interest in Arctic LNG 2 LLC
/art/2019/5/23/art_32091_15296778.html CNOOC Limited Announces Appomattox Field Commence Production
/art/2019/4/25/art_32091_15296251.html CNOOC Limited Signed a Heads of Agreement with JSC Novatek for the Arctic LNG 2 Project
/art/2019/4/25/art_32091_15296244.html CNOOC Limited Announces Key Operational Statistics for Q1 2019
/art/2019/4/24/art_32091_15296212.html CNOOC China Limited Signs a Petroleum Contract with PetroChina
/art/2019/4/23/art_32091_15296172.html CNOOC Limited Filed 2018 Annual Report on Form 20-F
/art/2019/4/12/art_32091_15295362.html CNOOC Signs a PSC with Smart Oil
/art/2019/3/21/art_32091_15292499.html Reserves and Production Steadily Expanded Net Profit Significantly Increased
/art/2019/1/29/art_32091_15284836.html CNOOC Limited Announced a New Discovery in UK North Sea
/art/2019/1/23/art_32091_15284095.html CNOOC Limited Announces its 2019 Business Strategy and Development Plan
/art/2019/1/16/art_32091_15283206.html CNOOC Limited Announces Huizhou 32-5 Oilfield Comprehensive Adjustment/Huizhou 33-1 Oilfield Joint Development Project Commences Production
/art/2019/1/2/art_32091_15272711.html CNOOC Limited Announces Egina Field Commenced Production
编辑:获取<option>
值:
import requests
from bs4 import BeautifulSoup
url = 'https://www.cnoocltd.com/col/col32091/index.html'
soup = BeautifulSoup(requests.get(url).text, 'html.parser')
for option in soup.select('.Introduction option[value]'):
print(option['value'])
印刷:
/col/col32091/index.html
/col/col32091/index.html
/col/col47345/index.html
/col/col44151/index.html
/col/col28131/index.html
/col/col14041/index.html
/col/col8341/index.html
/col/col8351/index.html
/col/col8361/index.html
/col/col8371/index.html
/col/col8381/index.html
/col/col8391/index.html
/col/col8401/index.html
/col/col8411/index.html
/col/col8421/index.html
/col/col8431/index.html
/col/col8441/index.html
/col/col8451/index.html
/col/col8461/index.html
/col/col8471/index.html