python - 尝试搜索 EPG XML 数据

Question

我正在尝试以 XML 格式 ( ) 搜索 EPG（电子节目指南xmltv）。我想查找所有包含特定文本的节目，例如哪些频道将显示今天的特定足球（足球）比赛。样本数据（真实数据> 20000个元素）：

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE tv SYSTEM "xmltv.dtd">
<tv generator-info-name="TX" generator-info-url="http://epg.net:8000/">
<channel id="GaliTV.es">
    <display-name>GaliTV</display-name>
    <icon src="http://logo.com/logos/GaliTV.png"/>
</channel>
<programme start="20210814080000 +0200" stop="20210814085500 +0200" channel="GaliciaTV.es" >
        <title>A Catedral de Santiago e o Mestre Mateo</title>
        <desc>Serie de catedral de Santiago de Compostela.</desc>
    </programme>
    <programme start="20210815050000 +0200" stop="20210815055500 +0200" channel="GaliciaTV.es" >
        <title>santiago</title>
        <desc>Chili.</desc>
    </programme>
</tv>

我只想在or属性包含特定文本（不区分大小写）<programme>时才显示属性。使用，我试过这个：titledescElementTree

for title in root.findall("./programme/title"):
   match = re.search(r'Santiago',title.text)
   if match:
       print(title.text)

它会找到一个结果，但是：

我收到一个我不明白的错误：

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/lib/python2.7/re.py", line 146, in search
    return _compile(pattern, flags).search(string)
TypeError: expected string or buffer

我不知道如何搜索不区分大小写，[Ss]antiago不起作用。
我想从父元素返回结果（例如programme.attributes）。

score 2 · Accepted Answer

您不会为此阅读正则表达式；尝试

for title in doc.findall('.//programme//title'):
    if "santiago" in title.text.lower():
        print(title.text)

您的样本的输出应该是

A Catedral de Santiago e o Mestre Mateo
santiago

编辑：

要从每个programme尝试获取所有数据：

for prog in doc.findall('.//programme'):
    title = prog.find('title').text
    if "santiago" in title.lower():      
        start,stop,channel = prog.attrib.values()
        desc = prog.find('.//desc').text
        print(start,stop,channel,'\n',title,'\n',desc)
        print('-----------')

输出：

20210814080000 +0200 20210814085500 +0200 GaliciaTV.es 
 A Catedral de Santiago e o Mestre Mateo 
 Chili.
-----------
20210815050000 +0200 20210815055500 +0200 GaliciaTV.es 
 santiago 
 Chili.

我还要补充一点，如果 xml 变得更复杂一点，从 ElementTree 切换到 lxml 可能是个好主意，因为后者具有更好的 xpath 支持。

python - 尝试搜索 EPG XML 数据

1 回答 1

Related

Reference