我正在尝试从网页中提取所有视频链接参考以及视频名称,我尝试了以下代码。
#!/usr/bin/python3
from bs4 import BeautifulSoup
import requests
import urllib
url = urllib.request.urlopen('https://www.ansible.com/resources/videos').read()
acc_link = BeautifulSoup(url, features="lxml")
for line in acc_link.find_all('a'):
print(line.get('href'))
输出:
https://www.ansible.com/?hsLang=en-us
https://www.ansible.com/overview/it-automation?hsLang=en-us
https://www.ansible.com/overview/it-automation?hsLang=en-us
https://www.ansible.com/overview/how-ansible-works?hsLang=en-us
https://www.ansible.com/products/automation-platform?hsLang=en-us
https://www.ansible.com/use-cases?hsLang=en-us
https://www.ansible.com/use-cases/provisioning?hsLang=en-us
https://www.ansible.com/use-cases/configuration-management?hsLang=en-us
https://www.ansible.com/use-cases/application-deployment?hsLang=en-us
https://www.ansible.com/use-cases/continuous-delivery?hsLang=en-us
https://www.ansible.com/use-cases/security-automation?hsLang=en-us
https://www.ansible.com/use-cases/orchestration?hsLang=en-us
https://www.ansible.com/integrations?hsLang=en-us
HTML源代码例如:
<h4><a href="https://www.ansible.com/resources/webinars-training/ansible-network-automation-with-arista-cloudvision-and-arista?hsLang=en-us">Ansible Network Automation with Arista CloudVision and Arista Validated Designs</a></h4>
像上面一样只是链接的 HTML 源代码的示例https://www.ansible.com/resources/videos我想要链接名称https://www.ansible.com/resources/webinars-training/ansible-network-automation-with-arista-cloudvision-and-arista
和视频名称Ansible Network Automation with Arista CloudVision and Arista Validated Designs
。
下面是我href
之前想要的另一个例子,?
并且a
重视 ie Scale-out Clustering with Tower 3.1
。
<h4><a href="https://www.ansible.com/scale-out-clustering-tower?hsLang=en-us">Scale-out Clustering with Tower 3.1</a></h4>
期望的输出:
影片名称:采用 Arista CloudVision 和 Arista 验证设计的 Ansible 网络自动化
感谢您的帮助。