ruby - 使用 Ruby 的 XML 数据扫描器：在性能方面是案例语句还是动态调度？

Question

我正在编写一个 XML 数据扫描器，它使用一些 XML 解析器库（如 nokogiri 等）读取 XML 文本，并生成一个节点树。我需要为每个 XML 元素创建一个对象。所以，我需要一个根据给定元素名称和属性创建对象的方法，就像这样，不管我使用的是哪种解析器库选项（SAX 或 DOM）：

create_node(name, attributes_hash)

此方法需要根据name. 实施的可能性是：

案例陈述
方法分派和预定义方法

由于这种方法可能会成为瓶颈，因此我编写了一个基准脚本来检查 Ruby 的性能。（在这个问题的最后部分附加了基准脚本。我不喜欢脚本的某些部分——尤其是如何创建案例语句——所以也欢迎对我如何改进这一点发表评论，但请将其提供为评论不是答案...我可能也需要为此创建一个问题..）。

该脚本以两种范围大小测量以下四种情况：

带有常量名称的方法调度
方法调度与名称连接#{}
方法调度与名称连接+
使用 case 语句，调用相同的方法

结果：

                                                 user     system      total        real
a to z: method_calls (with const name)       0.090000   0.000000   0.090000 (  0.092516)
a to z: method_calls (with dynamic name) 1   0.180000   0.000000   0.180000 (  0.181793)
a to z: method_calls (with dynamic name) 2   0.200000   0.000000   0.200000 (  0.202818)
a to z: switch_calls                         0.130000   0.000000   0.130000 (  0.132633)

                                                user     system      total        real
a to zz: method_calls (with const name)       2.900000   0.000000   2.900000 (  2.894273)
a to zz: method_calls (with dynamic name) 1   6.500000   0.010000   6.510000 (  6.507099)
a to zz: method_calls (with dynamic name) 2   6.980000   0.000000   6.980000 (  6.987534)
a to zz: switch_calls                         4.750000   0.000000   4.750000 (  4.742448)

我观察到基于 const name 的方法调度比使用 case 语句更快，但是，如果在确定方法名称时涉及字符串操作，则确定方法名称的成本比实际的方法调用成本要高，有效地使这些选项（2 和 3 ) 比选项 4 慢。此外，选项 2 和 3 之间的差异可以忽略不计。

为了使扫描仪安全，我更喜欢为方法添加一些前缀，因为没有它，可以制作 XML 来调用一些我不希望发生的方法。但是确定方法名称的成本是不可忽略的。

你怎么写这些扫描仪？我想知道以下问题的答案：

除了以上还有什么好的方案吗？
如果不是，您选择哪种（case-when 或 method dispatch）方案？
如果我不计算方法名称，它会更快。有什么好方法可以安全地进行方法调度吗？（例如，通过限制要调用的节点名称。）

基准脚本

# Benchmark to measure the difference of
# use of case statement and message passing

require 'benchmark'

def bench(title, tobj, count)
  Benchmark.bmbm do |b|
    b.report "#{title}: method_calls (with const name)" do
      (1..count).each do |c|
        tobj.run_send_using_const
      end
    end

    b.report "#{title}: method_calls (with dynamic name) 1" do
      (1..count).each do |c|
        tobj.run_send_using_dynamic_1
      end
    end

    b.report "#{title}: method_calls (with dynamic name) 2" do
      (1..count).each do |c|
        tobj.run_send_using_dynamic_2
      end
    end

    b.report "#{title}: switch_calls" do
      (1..count).each do |c|
        tobj.run_switch
      end
    end
  end
end


class Switcher
  def initialize(names)
    @method_names = { }
    @names = names
    names.each do |n|
      @method_names[n] = "dynamic_#{n}"
      @@n = n
      class << self
        mname = "dynamic_#{@@n}"
        define_method(mname) do
          mname
        end
      end
    end

    swst = ""
    names.each do |n|
      swst << "when \"#{n}\" then dynamic_#{n}\n"
    end

    st = "
    def run_switch_each(n)
      case n
#{swst}
      end
    end
    "
    eval(st)
  end

  def run_send_using_const
    @method_names.each_value do |n|
      self.send n
    end
  end

  def run_send_using_dynamic_1
    @names.each do |n|
      self.send "dynamic_#{n}"
    end
  end

  def run_send_using_dynamic_2
    @names.each do |n|
      self.send "dynamic_" + n
    end
  end

  def run_switch
    @names.each do |n|
      run_switch_each(n)
    end
  end

end


sw1 = Switcher.new('a'..'z')
sw2 = Switcher.new('a'..'zz')

bench("a to z", sw1, 10000)
bench("a to zz", sw2, 10000)

score 2 · Accepted Answer

我相信这是一个过早优化的例子。

但是确定方法名称的成本是不可忽略的。

与什么相比是不可忽视的？这里的方法有不同的性能数字，但是调度一个节点所花费的时间是否与解析节点（使用 Nokogiri 等）、构造专门的节点对象以及用它做任何你需要的事情所花费的时间相当？

我相信不会。我没有基准来证明该声明（您需要实际的代码），但是字符串连接与字符串插值实际上在结果中产生了显着差异（动态 1 与动态 2）这一事实是一个很好的指标，表明您是测量一些微不足道的东西。

或者每次调度添加一个字符串连接会使结果时间增加 2-2.5 倍（const vs dynamic2）。

ruby - 使用 Ruby 的 XML 数据扫描器：在性能方面是案例语句还是动态调度？

基准脚本

1 回答 1

Related

Reference