我想找到一个以“section_”开头的字符串,并将其作为值添加到同一行中的标记中。示例:以下是 ditamap 类型文件中的输入。
<topicref href="xyz/debug_logging_in_xyz-section_i_y_mn.dita"/>
<topicref href="xyz/workflows_id-section_exf_zaz_lo.dita"/>
<topicref href="xyz/images_id-section_ekl_bbz_lo.dita"/>
期望的输出:
<topicref href="xyz/debug_logging_in_xyz-section_i_y_mn.dita" keys="section_i_y_mn"/>
<topicref href="xyz/workflows_id-section_exf_zaz_lo.dita" keys="section_exf_zaz_lo"/>
<topicref href="xyz/images_id-section_ekl_bbz_lo.dita" keys="section_ekl_bbz_lo"/>
我了解 BeautifulSoup 可以用来实现这一点。但是,我是新手,不知道语法。任何人都可以帮忙吗?
这是我尝试使用的代码:
import os
from bs4 import BeautifulSoup as bs
globpath = "C:/DATA" #add your directory path here
def main(path):
with open(path, encoding="utf-8") as f:
s = f.read()
s = bs(s, "xml")
imgs = s.find_all("topicref")
for i in imgs:
if "section" in i["href"]:
i["keys"] = i["href"].replace("*-","").replace(".dita*","")
s = str(s)
with open(path, "w", encoding="utf-8") as f:
f.write(s)
for dirpath, directories, files in os.walk(globpath):
for fname in files:
if fname.endswith(".ditamap"):
path = os.path.join(dirpath, fname)
main(path)
但是,它在 keys 属性中添加了整个路径。我只需要以section 开头并在.dita 之前结束的部分。
正则表达式有效:这是最终代码
from bs4 import BeautifulSoup as bs
import re
globpath = "C:/DATA" #add your directory path here
def main(path):
with open(path, encoding="utf-8") as f:
s = f.read()
s = bs(s, "xml")
imgs = s.find_all("topicref")
for i in imgs:
if "section" in i["href"]:
try:
i["keys"] = re.findall("section[^\.]*",i["href"])[0]
except:
print("Could not replace")
s = str(s)
with open(path, "w", encoding="utf-8") as f:
f.write(s)```