我有一个 XML 文件。只是读到,我可以告诉你很兴奋。
现在我想完全删除其中的一些标签:
<qwerty option=1>
<nmo>sdfsdf</nmo>
<blue>sdfsdf</blue>
</qwerty>
这是一个大文件。我将如何删除所有标签nmo
和blue
,包括它们的内容?在 Emacs 或我的 mac 可以使用的任何其他东西中。
Emacs 有用于导航符号表达式或“sexps”的命令。在xml-mode
中,sexp 导航命令适用于标签。您可以导航到开头<
,按C-M-f( forward-sexp
) 导航到标签末尾,或按C-M-k( kill-sexp
) 将其终止。该变量nxml-sexp-element-flag
控制您是走到开始标签的末尾(默认)还是结束标签的末尾。我更喜欢后者。
要删除这些标签,首先nxml-sexp-element-flag
使用M-x customize-variable nxml-sexp-element-flag. 接下来,搜索您要删除的标签,将点移到开头<
并按C-M-k。将这一切包装在一个宏中,并在整个文件中重复,直到搜索失败。
我假设您的 xml 文件格式正确。而且我还假设与您的示例相反,您的“真实”数据比每行一个标签(根标签除外)要复杂一些。否则我们是否同意它就像删除包含给定标签的行一样简单?
这是一个可以解决问题的函数的命题:
(defun my-remove-tag (tag)
(save-excursion
(let ((case-fold-search nil))
(while (search-forward-regexp (concat "<" tag "[^\\>]*>"))
(delete-region
(match-beginning 0)
(search-forward (concat "</" tag ">")))))))
调用此函数,您可能会查找或标签,如下nmo
所示:blue
qwerty
(my-remove-tag "nmo")
(my-remove-tag "qwerty")
The rationale is looking for a opening tag then look for the closing one, and delete everything in the middle. Attributes for a tag could go in the middle of the way, and this function deal with opening tag containing attributes.
The case sensitiveness is disabled and restored once the function is done. Also the Emacs Point is restored with the usual macro : save-excusion
.
I removed the outer let. No need to restore the case-fold-search value by hand, the let binding simply shadows the global value, it is restored by by means of "unshadowing".
I believe that a more generic approach would be to use some more XML-oriented tool, like XSL(T) (don't be afraid, no one likes that), but it can come in handy if you have to work with XML (don't be afraid, no one likes that either).
So, here we go:
This is your XSL file (it copies all the stuff in the original XML and replaces the nodes you wanted to remove with empty lines. Finally, it prints it out, making it look somewhat prettier, then if you have it replaced using a regexp.
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
exclude-result-prefixes="msxsl"
>
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- Copy everything -->
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
<!-- Find any node named nmo or blue and replace it with nothing -->
<xsl:template match="nmo | blue"/>
</xsl:stylesheet>
This is my example I used to test:
<?xml version="1.0" encoding="utf-8"?>
<nodes>
<qwerty option="1">
<nmo>sdfsdf</nmo>
<blue>sdfsdf</blue>
</qwerty>
<nodes>
<qwerty option="1">
<nmo>sdfsdf</nmo>
<blue>sdfsdf</blue>
</qwerty>
</nodes>
<nodes>
<qwerty option="1">
<nmo>sdfsdf</nmo>
<blue>sdfsdf</blue>
</qwerty>
<other node=""/>
<nodes>
<qwerty option="1">
<nmo>sdfsdf</nmo>
<blue>sdfsdf</blue>
</qwerty>
<qwerty option="1">
<nmo>sdfsdf</nmo>
<blue>sdfsdf</blue>
</qwerty>
<qwerty option="1">
<nmo>sdfsdf</nmo>
<blue>sdfsdf</blue>
</qwerty>
</nodes>
</nodes>
</nodes>
And this is the output I'm receiving:
<?xml version="1.0"?>
<nodes>
<qwerty option="1"/>
<nodes>
<qwerty option="1"/>
</nodes>
<nodes>
<qwerty option="1"/>
<other node=""/>
<nodes>
<qwerty option="1"/>
<qwerty option="1"/>
<qwerty option="1"/>
</nodes>
</nodes>
</nodes>
Notice how it also closed the qwerty
nodes.
The command line to get this would be something like:
xsltproc ./remove-nodes.xsl ./nodes-to-be-removed.xml > result.xml
You could run it from Emacs' shell, or use any of Emacs' function to call it / create a process with it and so on. man xsltproc
for more info - it's usage is really basic. It was installed on my Fedora, but I would imagine that due to widespread of XML around the world it would either be already installed on a Mac, or must be installable in some way.