1

我正在尝试删除介于 66 号之间的所有内容:

我收到以下错误:TypeError:'NoneType' 类型的参数不可迭代...如果 element.tag == 'answer' 和 element.text 中的 '-66':

那有什么问题?有什么帮助吗?

#!/usr/local/bin/python2.7
# -*- coding: UTF-8 -*- 

from lxml import etree

planhtmlclear_utf=u"""
<questionaire>
<question>
<questiontext>What's up?</questiontext>
<answer></answer>
</question>
<question>
<questiontext>Cool?</questiontext>
<answer>-66</answer>
</question>
</questionaire>

"""

html = etree.fromstring(planhtmlclear_utf)
questions = html.xpath('/questionaire/question')
for question in questions:
    for element in question.getchildren():
        if element.tag == 'answer' and '-66' in element.text:
            html.xpath('/questionaire')[0].remove(question)
print etree.tostring(html) 
4

2 回答 2

1

element.text 在某些迭代中似乎为 None 。错误是说它无法通过“-66”查看无,因此首先检查 element.text 是否为无,如下所示:

html = etree.fromstring(planhtmlclear_utf)
questions = html.xpath('/questionaire/question')
for question in questions:
    for element in question.getchildren():   
        if element.tag == 'answer' and element.text and '-66' in element.text:
            html.xpath('/questionaire')[0].remove(question)
print etree.tostring(html) 

它在 xml 中失败的那一行是<answer></answer>标签之间没有文本的地方。


编辑关于组合标签的问题的第二部分)

你可以BeautifulSoup这样使用:

from lxml import etree
import BeautifulSoup

planhtmlclear_utf=u"""
<questionaire>
<question>
<questiontext>What's up?</questiontext>
<answer></answer>
</question>
<question>
<questiontext>Cool?</questiontext>
<answer>-66</answer>
</question>
</questionaire>"""

html = etree.fromstring(planhtmlclear_utf)
questions = html.xpath('/questionaire/question')
for question in questions:
    for element in question.getchildren():   
        if element.tag == 'answer' and element.text and '-66' in element.text:
            html.xpath('/questionaire')[0].remove(question)

soup = BeautifulSoup.BeautifulStoneSoup(etree.tostring(html))
print soup.prettify()

印刷:

<questionaire>
 <question>
  <questiontext>
   What's up?
  </questiontext>
  <answer>
  </answer>
 </question>
</questionaire>

这是一个可以下载BeautifulSoup 模块的链接。


或者,以更紧凑的方式执行此操作:

from lxml import etree
import BeautifulSoup    

# abbreviating to reduce answer length...
planhtmlclear_utf=u"<questionaire>.........</questionaire>"

html = etree.fromstring(planhtmlclear_utf)
[question.getparent().remove(question) for question in html.xpath('/questionaire/question[answer/text()="-66"]')]
print BeautifulSoup.BeautifulStoneSoup(etree.tostring(html)).prettify()
于 2011-10-08T13:37:16.683 回答
1

element.text检查 if is的另一种方法None是优化您的 XPath:

questions = html.xpath('/questionaire/question[answer/text()="-66"]')
for question in questions:
    question.getparent().remove(question)

括号的[...]意思是“这样”。所以

question                          # find all question elements
[                                 # such that 
  answer                          # it has an answer subelement
    /text()                       # whose text 
  =                               # equals
  "-66"                           # "-66"
]
于 2011-10-08T14:04:28.680 回答