0

我的目标:我想在主题 xml 文档中获取每个名为“SECTION”的元素;获取每个 SECTION 及其下的所有内容。

约束:我必须使用 LibXML Ruby;即,需要'xml'。

问题:输出数据被截断。

问题(见输出 file1.xml)

  • 为什么 file1.xml 中的输出被截断?注意:第一个 P(a).../P 标记之间的大部分文本(注意:截断从单词“ethic...”开始)
  • 为什么代码删除了最后两个 P 元素(P(b)...、P(2)...)和 CITA 元素?是什么导致 xml version="1.0" encoding="UTF-8"?和 SECTION/ 出现在输出的末尾?

注意:输出 file2.xml 有更严重的截断。我把它包括在内,以防它澄清任何事情。

这是代码:

#!/usr/bin/ruby
require "xml"
reader = XML::Reader.file('infile2.xml')
while reader.read
  node = reader.node 
    if node.name == "SECTION"
      iteration = XML::Document.string(node.to_s)
      puts iteration
      puts "\n"
    end
end

输入文件1.xml:

<?xml version="1.0"?>
<SECTION>
  <SECTNO>§ 0.735-1</SECTNO>
  <SUBJECT>Agency ethics officials.</SUBJECT>
  <P>(a) <E T="03">Designated Agency Ethics Official (DAEO).</E> The Assistant General Counsel (023) is the designated agency ethics official (DAEO) for the Department of Veterans Affairs. The Deputy Assistant General Counsel (023C) is the alternate DAEO, who is designated to act in the DAEO's absence. The DAEO has primary responsibility for the administration, coordination, and management of the VA ethics program, pursuant to 5 CFR 2638.201-204.</P>
  <P>(b) <E T="03">Deputy ethics officials.</E> (1) The Regional Counsel are deputy ethics officials. They have been delegated the authority to act for the DAEO within their jurisdiction, under the DAEO's supervision, pursuant to 5 CFR 2638.204.</P>
  <P>(2) The alternate DAEO, the DAEO's staff, and staff in the Offices of Regional Counsel, may also act as deputy ethics officials pursuant to delegations of one or more of the DAEO's duties from the DAEO or the Regional Counsel.</P>
  <CITA>[58 FR 61813, Nov. 23, 1993. Redesignated at 61 FR 11309, Mar. 20, 1996]</CITA>
</SECTION>

输出,给定输入 file1.xml(上图):

<?xml version="1.0" encoding="UTF-8"?>
<SECTION>
  <SECTNO>§ 0.735-1</SECTNO>
  <SUBJECT>Agency ethics officials.</SUBJECT>
  <P>(a) <E T="03">Designated Agency Ethics Official (DAEO).</E> The Assistant General Counsel (023) is the designated agency ethics official (DAEO) for the Department of Veterans Affairs. The Deputy Assistant General Counsel (023C) is the alternate DAEO, who is designated to act in the DAEO's absence. The DAEO has primary responsibility for the administration, coordination, and management of the VA ethic</P></SECTION>

<?xml version="1.0" encoding="UTF-8"?>
<SECTION/>

输入文件2.xml:

<?xml version="1.0"?>
<SUBPART>
  <HD SOURCE="HED">Subpart A—General Provisions</HD>
  <SECTION>
    <SECTNO>§ 0.735-1</SECTNO>
    <SUBJECT>Agency ethics officials.</SUBJECT>
    <P>(a) <E T="03">Designated Agency Ethics Official (DAEO).</E> The Assistant General Counsel (023) is the designated agency ethics official (DAEO) for the Department of Veterans Affairs. The Deputy Assistant General Counsel (023C) is the alternate DAEO, who is designated to act in the DAEO's absence. The DAEO has primary responsibility for the administration, coordination, and management of the VA ethics program, pursuant to 5 CFR 2638.201-204.</P>
    <P>(b) <E T="03">Deputy ethics officials.</E> (1) The Regional Counsel are deputy ethics officials. They have been delegated the authority to act for the DAEO within their jurisdiction, under the DAEO's supervision, pursuant to 5 CFR 2638.204.</P>
    <P>(2) The alternate DAEO, the DAEO's staff, and staff in the Offices of Regional Counsel, may also act as deputy ethics officials pursuant to delegations of one or more of the DAEO's duties from the DAEO or the Regional Counsel.</P>
    <CITA>[58 FR 61813, Nov. 23, 1993. Redesignated at 61 FR 11309, Mar. 20, 1996]</CITA>
  </SECTION>
  <SECTION>
    <SECTNO>§ 0.735-2</SECTNO>
    <SUBJECT>Government-wide standards.</SUBJECT>
    <P>For government-wide standards of ethical conduct and related responsibilities for Federal employees, see 5 CFR Part 735 and Chapter XVI.</P>
    <CITA>[61 FR 11309, Mar. 20, 1996. Redesignated at 63 FR 33579, June 19, 1998]</CITA>
  </SECTION>
</SUBPART>

输出,给定输入 file2.xml(上图):

<?xml version="1.0" encoding="UTF-8"?>
<SECTION>
    <SECTNO>§ 0.735-1</SECTNO>
    <SUBJECT>Agency ethics officials.</SUBJECT>
    <P>(a) <E T="03">Designated Agency Ethics Official (DAEO).</E></P></SECTION>

<?xml version="1.0" encoding="UTF-8"?>
<SECTION/>

<?xml version="1.0" encoding="UTF-8"?>
<SECTION>
    <SECTNO>§ 0.735-2</SECTNO>
    <SUBJECT>Government-wide standards.</SUBJECT>
    <P>For government-wide standards of ethical conduct and related responsibilities for Federal employees, see 5 CFR Part 735 and Chapter XVI.</P>
    <CITA/></SECTION>

<?xml version="1.0" encoding="UTF-8"?>
<SECTION/>
4

1 回答 1

0

除非您有一个巨大的 XML 文档,否则请考虑以下内容:

require "xml"
doc = XML::Document.file('infile1.xml')
doc.find('/SECTION').each do |s|
  puts "[#{s}]"
end

这输出:

<SECTION>
  <SECTNO>§ 0.735-1</SECTNO>
  <SUBJECT>Agency ethics officials.</SUBJECT>
  <P>(a) <E T="03">Designated Agency Ethics Official (DAEO).</E> The Assistant General Counsel (023) is the designated agency ethics official (DAEO) for the Department of Veterans Affairs. The Deputy Assistant General Counsel (023C) is the alternate DAEO, who is designated to act in the DAEO's absence. The DAEO has primary responsibility for the administration, coordination, and management of the VA ethics program, pursuant to 5 CFR 2638.201-204.</P>
  <P>(b) <E T="03">Deputy ethics officials.</E> (1) The Regional Counsel are deputy ethics officials. They have been delegated the authority to act for the DAEO within their jurisdiction, under the DAEO's supervision, pursuant to 5 CFR 2638.204.</P>
  <P>(2) The alternate DAEO, the DAEO's staff, and staff in the Offices of Regional Counsel, may also act as deputy ethics officials pursuant to delegations of one or more of the DAEO's duties from the DAEO or the Regional Counsel.</P>
  <CITA>[58 FR 61813, Nov. 23, 1993. Redesignated at 61 FR 11309, Mar. 20, 1996]</CITA>
</SECTION>

这并不能回答问题,而是一种解决方法。

我不确定使用阅读器的实际问题是什么,但我怀疑它与游标有关。例如,以下工作,delta 第一个 XML 文档还有一个额外的空白部分:

if node.name == "SECTION"
  puts "#{reader.read_outer_xml}"
end
于 2013-03-08T20:38:47.897 回答