python - 打开 xml 文件，在其中找到 url，打开链接并搜索值 - python lxml

Question

我正在使用 lxml 打开一个 xml 文件，并且在保存到新的 xml 文件之前已经做了很多编辑，这一切都很好。在我打开的 xml 中，我有一个链接到网页的 url。在网页中有一些我想在我的打开的 xml 中记录和使用的值。我已经搜索过，但找不到从哪里开始。

亲切的问候。

更新 -

我正在使用下面的代码从我的 xml 中获取 url，这是有效的。然后我可以将所有页面读入数据变量，打印效果很好：

url = tree.find("//video/products/product/read_only_info/read_only_value[@key='storeURL-GB']")
if url is not None:
    url = url.text
    data = urllib2.urlopen(url)
    data = data.read()
    print data

如何找到隐藏在网页中的特定字符串，这是我想要获取的一段网页数据：

<div id="content">

  <div class="padder">

    <div id="title" class="intro">
      <div class="left">
        <h1>This is the title</h1>
        &nbsp;&nbsp;<span rating-system="bbfc" rating-id="37" class="content-rating">15</span>
        <h2>this is more text</h2>
      </div>
      <div class="right">
        <a href="https://rthuere.erwerwer.ghty4e.fdfsdf.com" class="view-more">View More In Sci-Fi &amp; Fantasy</a>

      </div>

我需要获得“在科幻与幻想中查看更多内容”的价值或任何其他价值。

亲切的问候。

score 0 · Accepted Answer

我正在使用下面的代码打开然后搜索特定的文本，这是有效的。

data = urllib2.urlopen(url)
data = data.read()
primaryGenre = data

if "View More In Sci-Fi &amp; Fantasy" in data:
    then do something else

问候。

score 0 · Accepted Answer

如果要获取所有 a 节点的文本，可以使用Beautifulsoup来完成：

soup = BeautifulSoup(html_page)
for link in soup.findAll('a'):
    print link.text

这回答了你的问题了吗？

python - 打开 xml 文件，在其中找到 url，打开链接并搜索值 - python lxml

2 回答 2

Related

Reference