ruby - 如何使用Nokogiri递归删除XML中特定xpath位置的空子元素？

Question

我有以下 XML，其中我有几个带有空文本的子元素。

doc = <<'XML'
<Book>
    <BookId>BK45647</BookId>
    <BookName>The Client by John Grisham</BookName>
    <BookAuthenticationCode></BookAuthenticationCode>
    <BookCategory>Suspense</BookCategory>
    <BookSequence></BookSequence>
    <BookPublisherInfo>
        <PublisherId>PBBK12345</PublisherId>
        <PublisherName>Mc.GrawHill</PublisherName>
        <PublisherIndex></PublisherIndex>
        <PublisherCategoryQuota></PublisherCategoryQuota>
    </BookPublisherInfo>
    <BookPurchaselist>
       <Customer>
           <FirstName>John</FirstName>
           <LastName>Smith</LastName>
           <MiddleName></MiddleName>
           <NickName></NickName>
       </Customer>
        <Customer>
           <FirstName>Winston</FirstName>
           <LastName>Churchill</LastName>
           <MiddleName></MiddleName>
           <NickName></NickName>
       </Customer>
    </BookPurchaselist>
</Book>
XML

我尝试使用下面的代码，但它以某种方式无法正常工作。

cust = doc.at_xpath("//Customer")
cust.each do |cust_obj|
    if cust_obj.has_text? == false
       cust_obj.delete
    end
end

这在某种程度上无法正常工作并给出以下输出

<Book>
    <BookId>BK45647</BookId>
    <BookName>The Client by John Grisham</BookName>
    <BookAuthenticationCode></BookAuthenticationCode>
    <BookCategory>Suspense</BookCategory>
    <BookSequence></BookSequence>
    <BookPublisherInfo>
        <PublisherId>PBBK12345</PublisherId>
        <PublisherName>Mc.GrawHill</PublisherName>
        <PublisherIndex></PublisherIndex>
        <PublisherCategoryQuota></PublisherCategoryQuota>
    </BookPublisherInfo>
    <BookPurchaselist>
       <Customer>
           <FirstName>John</FirstName>
           <LastName>Smith</LastName>
           <MiddleName></MiddleName>
       </Customer>
        <Customer>
           <FirstName>Winston</FirstName>
           <LastName>Churchill</LastName>
           <NickName></NickName>
       </Customer>
    </BookPurchaselist>
</Book>

很少有具有空文本的元素正在获取，并且很少有这样的元素。我如何递归地删除特定 xpath 中的元素（带有空数据）并重新编写 XML。

卡在这里..需要建议。

score 4 · Accepted Answer

doc.xpath('//Customer/child::*[not(text())]').each do |node|
  node.remove
end

not(node())如果要删除没有子节点的节点，也可以使用。

编辑：完整的工作示例（使用与上面相同的代码）

require 'nokogiri'

xml = <<-XML
<Book>
    <BookId>BK45647</BookId>
    <BookName>The Client by John Grisham</BookName>
    <BookAuthenticationCode></BookAuthenticationCode>
    <BookCategory>Suspense</BookCategory>
    <BookSequence></BookSequence>
    <BookPublisherInfo>
        <PublisherId>PBBK12345</PublisherId>
        <PublisherName>Mc.GrawHill</PublisherName>
        <PublisherIndex></PublisherIndex>
        <PublisherCategoryQuota></PublisherCategoryQuota>
    </BookPublisherInfo>
    <BookPurchaselist>
       <Customer>
           <FirstName>John</FirstName>
           <LastName>Smith</LastName>
           <MiddleName></MiddleName>
       </Customer>
        <Customer>
           <FirstName>Winston</FirstName>
           <LastName>Churchill</LastName>
           <NickName></NickName>
       </Customer>
    </BookPurchaselist>
</Book>
XML

doc = Nokogiri.parse(xml)

doc.xpath('//Customer/child::*[not(text())]').each do |node|
  node.remove
end

puts doc.to_s

这个程序的输出是：

<?xml version="1.0"?>
<Book>
    <BookId>BK45647</BookId>
    <BookName>The Client by John Grisham</BookName>
    <BookAuthenticationCode/>
    <BookCategory>Suspense</BookCategory>
    <BookSequence/>
    <BookPublisherInfo>
        <PublisherId>PBBK12345</PublisherId>
        <PublisherName>Mc.GrawHill</PublisherName>
        <PublisherIndex/>
        <PublisherCategoryQuota/>
    </BookPublisherInfo>
    <BookPurchaselist>
       <Customer>
           <FirstName>John</FirstName>
           <LastName>Smith</LastName>

       </Customer>
        <Customer>
           <FirstName>Winston</FirstName>
           <LastName>Churchill</LastName>

       </Customer>
    </BookPurchaselist>
</Book>

ruby - 如何使用Nokogiri递归删除XML中特定xpath位置的空子元素？

1 回答 1

Related

Reference