0

我正在尝试修改几周前的电话号码脚本以帮助朋友。这是我用作起点的脚本。

# import regular expressions 
import re
# import argv 
from sys import argv

#arguments to provide at command line 
script, filename = argv

#load the file
data = open(filename)
#read the file
read_file = data.read()

# create a regular expression to filter out phone numbers 
phone_finder = re.compile(r"\(\d{3}\)\s*\d{3}-\d{4}")

# r to tell its a raw string
# \( to match "("
# \d{3} to match 3 digits
# \) to match ")"
# \s* account for no spaces
# \d{3} to match 3 digits
# - to match an "-"
# \d{4} to match 4 digits

# print the results
print phone_finder.findall(read_file)

他想要一种方法来搜索 XML 文件并查找“<excerpt:encoded><![CDATA[]]></excerpt:encoded>"

<excerpt:encoded><![CDATA[We love having a frother to make a latte or cappuccino, and think you'll enjoy some hot milk on these cold winter nights to put you to sleep as well.]]></excerpt:encoded>

并将所有实例替换为

<excerpt:encoded><![CDATA[]]></excerpt:encoded>

但我不确定那将如何工作,因为在第二个示例中,文件中每个实例的文本都会有所不同。

我是 Python 新手,因此我们将不胜感激。感谢您的时间。

4

1 回答 1

0

要从<excerpt:encoded>元素中删除所有内容:

import xml.etree.cElementTree as etree

etree.register_namespace('excerpt', 'your namespace') # to preserve prefix

# read xml
doc = etree.parse(filename)

# clear elements
for element in doc.iter(tag='{your namespace}encoded'): 
    element.clear()

# write xml
doc.write(filename + '.cleared')

您应该替换'your namespace'为实际命名空间excerpt前缀所指的。

于 2013-08-20T15:54:51.503 回答