python - 在python中使用beautifulsoup来获取链接名称和“选择”链接而不是限制？

Question

我有以下代码试图从一些 html 返回数据，但是我无法返回我需要的内容......

import urllib2
from bs4 import BeautifulSoup
from time import sleep

def getData():
    htmlfile = open('C:/html.html', 'rb')
    html = htmlfile.read()
    soup = BeautifulSoup(html)
    items = soup.find_all('div', class_="blocks")
    for item in items:
        links = item.find_all('h3')
        for link in links:
            print link

getData()

返回以下列表：

<h3>
    <a href="http://www.mywebsite.com/titles" title="Click for details(x)">
    TITLE STUFF HERE (YES)
    </a>
</h3>

<h3>
    <a href="http://www.mywebsite.com/titles" title="Click for details(x)">
    TITLE STUFF HERE (MAYBE)
    </a>
</h3>

我希望能够只返回标题：TITLE STUFF HERE (YES)和TITLE STUFF HERE (MAYBE)

我希望能够使用该 soup.find_all("a", limit=2)功能而不是“限制”而不是只返回两个结果我希望它只返回第二个链接......所以选择功能不是限制？（有这样的功能吗？）

score 5 · Accepted Answer

import urllib2
from bs4 import BeautifulSoup
from time import sleep

def getData():
    htmlfile = open('C:/html.html', 'rb')
    html = htmlfile.read()
    soup = BeautifulSoup(html)
    items = soup.find_all('div', class_="blocks")
    for item in items:
        links = item.find_all('a')
        for link in links:
            if link.parent.name == 'h3':
                print(link.text)

getData()

您也可以从一开始就找到所有链接并检查父级是 h3 并且父级的父级是带有类块的 div

python - 在python中使用beautifulsoup来获取链接名称和“选择”链接而不是限制？

1 回答 1

Related

Reference