1

谁能演示如何使用正则表达式搜索对字符串的特定部分进行 base64 解码?我希望最终结果返回整个字符串,但对 base64 区域进行解码。

类别标签和子类别标签之间的文本应该被解码,然后应该返回整个 strinf。

<attack_headline><site_id>1</site_id><category>U1FMIEluamVjdGlvbg==</category><subcategory>Q2xhc3NpYyBTUUwgQ29tbWVudCAmcXVvdDstLSZxdW90Ow==</subcategory><client_ip>192.168.1.102</client_ip><date>1363807248</date><gmt_diff>0</gmt_diff><reference_id>E711-3EFB-5F43-5FAC</reference_id></attack_headline>
4

2 回答 2

1

根据我的评论,这是一个使用 的示例lxml.etree,它假设您的输入是 XML(如果是 HTML,请lxml.html改用):

>>> import base64
>>> import lxml.etree
>>> text = "<attack_headline><site_id>1</site_id><category>U1FMIEluamVjdGlvbg==</category><subcategory>Q2xhc3NpYyBTUUwgQ29tbWVudCAmcXVvdDstLSZxdW90Ow==</subcategory><client_ip>192.168.1.102</client_ip><date>1363807248</date><gmt_diff>0</gmt_diff><reference_id>E711-3EFB-5F43-5FAC</reference_id></attack_headline>"
>>> xml = lxml.etree.fromstring(text)
>>> for tag_with_base64 in ('category','subcategory'):
...     node = xml.find(tag_with_base64)
...     if node:
...         node.text = base64.b64decode(node.text)
>>> lxml.etree.tostring(xml)
'<attack_headline><site_id>1</site_id><category>SQL Injection</category><subcategory>Classic SQL Comment &amp;quot;--&amp;quot;</subcategory><client_ip>192.168.1.102</client_ip><date>1363807248</date><gmt_diff>0</gmt_diff><reference_id>E711-3EFB-5F43-5FAC</reference_id></attack_headline>'
于 2013-03-20T19:53:20.220 回答
0
events = client.service.get_recent_attacks("",epoch_time_last,epoch_time_now,1,"",15)
text = re.sub('(?<!<\/attack_headline>)\s*\n\s*', '',  events)
xml = lxml.etree.fromstring(text)
for tag_with_base64 in ('category','subcategory'):
    node = xml.find(tag_with_base64)
    node.text = base64.b64decode(node.text)
lxml.etree.tostring(xml)
于 2013-03-20T21:12:59.740 回答