1

使用 pyparsing 可以实现相反的效果,如下所示:

from pyparsing import Suppress, replaceWith, makeHTMLTags, SkipTo
#...
removeText = replaceWith("")
scriptOpen, scriptClose = makeHTMLTags("script")
scriptBody = scriptOpen + SkipTo(scriptClose) + scriptClose
scriptBody.setParseAction(removeText)
data = (scriptBody).transformString(data)

我怎样才能保留标签的内容"table"

更新0:

我试过: # 只保留表格 tableOpen, tableClose = makeHTMLTags("table") tableBody = tableOpen + SkipTo(tableClose) + tableClose f = replaceWith(tableBody) tableBody.setParseAction(f) data = (tableBody).transformString(data)打印数据

我得到这样的东西......

garbages
<input type="hidden" name="cassstx"   value="en_US:frontend"></form></td></tr></table></span></td></tr></table> 

{<"table"> SkipTo:(</"table">) </"table">} 
<div id="asbnav" style="padding-bottom: 10px;">{<"table"> SkipTo:(</"table">) </"table">} 
</div> 
even more garbages

更新 2:

谢谢马泰利。我需要的是:

from pyparsing import Suppress, replaceWith, makeHTMLTags, SkipTo
#...
data = 'before<script>ciao<table>buh</table>bye</script>after'

tableOpen, tableClose = makeHTMLTags("table")
tableBody = tableOpen + SkipTo(tableClose) + tableClose
thetable = (tableBody).searchString(data)[0][2]

print thetable
4

1 回答 1

1

您可以首先提取表格(类似于您现在提取脚本的方式,但当然没有删除;-),获得一个thetable字符串;然后,您提取脚本,replaceWith(thetable)而不是replaceWith(''). 或者,您可以准备一个更精细的解析操作,但简单的两阶段方法对我来说看起来更直接。例如(专门保留的内容,而table不是table 标签):

from pyparsing import Suppress, replaceWith, makeHTMLTags, SkipTo
#...
data = 'before<script>ciao<table>buh</table>bye</script>after'

tableOpen, tableClose = makeHTMLTags("table")
tableBody = tableOpen + SkipTo(tableClose) + tableClose
thetable = (tableBody).searchString(data)[0][2]

removeText = replaceWith(thetable)
scriptOpen, scriptClose = makeHTMLTags("script")
scriptBody = scriptOpen + SkipTo(scriptClose) + scriptClose
scriptBody.setParseAction(removeText)
data = (scriptBody).transformString(data)

print data

这将打印beforebuhafter(脚本标签之外的内容,表格标签的内容夹在里面),希望“按需要”。

于 2010-06-13T14:32:27.877 回答