python - 通过 LXML 通过 XPATH 查找元素 - Python

Question

我在使用 LXML 抓取一些 Web 数据时遇到了一些问题。我想使用 BeautifulSoup 从网站上抓取一些东西，所以我决定使用 LXML。我编写了一些代码并让 Discord Bot 访问该网站。现在唯一剩下的就是编写代码来查找这些元素。这是我的代码，不胜感激。

@tasks.loop(seconds = 10)
    async def exchangeRate(self):
        print("Loop Starting!")
        HEADERS = {
            'User-Agent' : "Magic Browser"
        }

        url = 'https://rubyrealms.com/economy/bank'

        async with aiohttp.request("GET", url, headers=HEADERS) as response:
            if response.status == 200:
                #Scrape page content into one variable
                content = await response.text()
                #Initialize soup
                soup = BeautifulSoup(content, "html.parser")
                #Request access to site
                page = requests.get(url)
                #Declaring "tree" - Used to scrape by XPATH
                tree = html.fromstring(page.content)
                stuff = tree.xpath('//*[@id="content-wrap"]/div[3]/div[3]/div[2]/div[1]/div[2]/div[1]/div[2]/div[2]/h4')
                print(stuff)

            else:
                print(f"The request was invalid\nStatus code: {response.status}")

这是我的 Discord.Py ReWrite 任务循环，基本上每 10 秒它就会访问该站点。如图所示，以下代码有效，除此之外：

stuff = tree.xpath('//*[@id="content-wrap"]/div[3]/div[3]/div[2]/div[1]/div[2]/div[1]/div[2]/div[2]/h4')
print(stuff)

它唯一打印的是“循环开始！” 从循环的开始。使用上面的代码（长代码）我打印出来：

Bot is ready for duty!
Exchange Cog is ready!
Waiting for loop!
Loop Starting!
[]

我想要显示的是：

Bot is ready for duty!
Exchange Cog is ready!
Waiting for loop!
Loop Starting!
243

（这个数字每天都在变化，这就是为什么我不能只使用一次。）

如果有人知道我将如何解决这个问题，请提供帮助。先感谢您。

score 0 · Accepted Answer

有tree7 个<h4>标签符合您评论中的描述。如果我对您的理解正确，为了获得所有 7 个，您可以使用以下命令：

stuff = tree.xpath('//h4[@data-toggle="tooltip"]')
for s in stuff:
    print(s.text)

输出是：

如果您提前知道您的目标数字（如246本文所示tree）始终是第一个，您甚至可以将其缩短为：

stuff = tree.xpath('//h4[@data-toggle="tooltip"]')[0]
print(stuff.text)

输出将是：

python - 通过 LXML 通过 XPATH 查找元素 - Python

1 回答 1

Related

Reference