3

我正在尝试使用 open-nlp Ruby gem 通过 RJB(Ruby Java Bridge)访问 Java OpenNLP 处理器。我不是Java程序员,所以我不知道如何解决这个问题。任何关于解决它、调试它、收集更多信息等的建议将不胜感激。

环境为 Windows 8、Ruby 1.9.3p448、Rails 4.0.0、JDK 1.7.0-40 x586。宝石是 rjb 1.4.8 和 louismullie/open-nlp 0.1.4。作为记录,这个文件在 JRuby 中运行,但我在那个环境中遇到了其他问题,并且现在更愿意保持原生 Ruby。

简而言之,open-nlp gem 因 java.lang.NullPointerException 和缺少 Ruby 错误方法而失败。我犹豫说为什么会发生这种情况,因为我不知道,但在我看来,无法访问 Jars 文件 opennlp.tools.postag.POSTaggerME@1b5080a 的动态加载,可能是因为 OpenNLP::Bindings::Utils .tagWithArrayList 未正确设置。OpenNLP::Bindings 是 Ruby。实用程序及其方法是 Java。Utils 应该是“默认”的 Jars 和 Class 文件,这可能很重要。

我在这里做错了什么?谢谢!

我正在运行的代码是直接从github/open-nlp复制而来的。我的代码副本是:

class OpennlpTryer

  $DEBUG=false

  # From https://github.com/louismullie/open-nlp
  # Hints: Dir.pwd; File.expand_path('../../Gemfile', __FILE__);
  # Load the module
  require 'open-nlp'
  #require 'jruby-jars'

=begin
  # Alias "write" to "print" to monkeypatch the NoMethod write error
  java_import java.io.PrintStream
  class PrintStream
    java_alias(:write, :print, [java.lang.String])
  end
=end

=begin
  # Display path of jruby-jars jars...
  puts JRubyJars.core_jar_path # => path to jruby-core-VERSION.jar
  puts JRubyJars.stdlib_jar_path # => path to jruby-stdlib-VERSION.jar
=end
  puts ENV['CLASSPATH']

  # Set an alternative path to look for the JAR files.
  # Default is gem's bin folder.
  # OpenNLP.jar_path = '/path_to_jars/'

  OpenNLP.jar_path = File.join(ENV["GEM_HOME"],"gems/open-nlp-0.1.4/bin/")
  puts OpenNLP.jar_path
  # Set an alternative path to look for the model files.
  # Default is gem's bin folder.
  # OpenNLP.model_path = '/path_to_models/'

  OpenNLP.model_path = File.join(ENV["GEM_HOME"],"gems/open-nlp-0.1.4/bin/")
  puts OpenNLP.model_path
  # Pass some alternative arguments to the Java VM.
  # Default is ['-Xms512M', '-Xmx1024M'].
  # OpenNLP.jvm_args = ['-option1', '-option2']
  OpenNLP.jvm_args = ['-Xms512M', '-Xmx1024M']
  # Redirect VM output to log.txt
  OpenNLP.log_file = 'log.txt'
  # Set default models for a language.
  # OpenNLP.use :language
  OpenNLP.use :english          # Make sure this is lower case!!!!

# Simple tokenizer

  OpenNLP.load

  sent = "The death of the poet was kept from his poems."
  tokenizer = OpenNLP::SimpleTokenizer.new

  tokens = tokenizer.tokenize(sent).to_a
# => %w[The death of the poet was kept from his poems .]
  puts "Tokenize #{tokens}"

# Maximum entropy tokenizer, chunker and POS tagger

  OpenNLP.load

  chunker = OpenNLP::ChunkerME.new
  tokenizer = OpenNLP::TokenizerME.new
  tagger = OpenNLP::POSTaggerME.new

  sent = "The death of the poet was kept from his poems."

  tokens = tokenizer.tokenize(sent).to_a
# => %w[The death of the poet was kept from his poems .]
  puts "Tokenize #{tokens}"

  tags = tagger.tag(tokens).to_a
# => %w[DT NN IN DT NN VBD VBN IN PRP$ NNS .]
  puts "Tags #{tags}"

  chunks = chunker.chunk(tokens, tags).to_a
# => %w[B-NP I-NP B-PP B-NP I-NP B-VP I-VP B-PP B-NP I-NP O]
  puts "Chunks #{chunks}"


# Abstract Bottom-Up Parser

  OpenNLP.load

  sent = "The death of the poet was kept from his poems."
  parser = OpenNLP::Parser.new
  parse = parser.parse(sent)

=begin
  parse.get_text.should eql sent

  parse.get_span.get_start.should eql 0
  parse.get_span.get_end.should eql 46
  parse.get_child_count.should eql 1
=end

  child = parse.get_children[0]

  child.text # => "The death of the poet was kept from his poems."
  child.get_child_count # => 3
  child.get_head_index #=> 5
  child.get_type # => "S"

  puts "Child: #{child}"

# Maximum Entropy Name Finder*

  OpenNLP.load

  # puts File.expand_path('.', __FILE__)
  text = File.read('./spec/sample.txt').gsub!("\n", "")

  tokenizer = OpenNLP::TokenizerME.new
  segmenter = OpenNLP::SentenceDetectorME.new
  puts "Tokenizer: #{tokenizer}"
  puts "Segmenter: #{segmenter}"

  ner_models = ['person', 'time', 'money']
  ner_finders = ner_models.map do |model|
    OpenNLP::NameFinderME.new("en-ner-#{model}.bin")
  end
  puts "NER Finders: #{ner_finders}"

  sentences = segmenter.sent_detect(text)
  puts "Sentences: #{sentences}"

  named_entities = []

  sentences.each do |sentence|
    tokens = tokenizer.tokenize(sentence)
    ner_models.each_with_index do |model, i|
      finder = ner_finders[i]
      name_spans = finder.find(tokens)
      name_spans.each do |name_span|
        start = name_span.get_start
        stop = name_span.get_end-1
        slice = tokens[start..stop].to_a
        named_entities << [slice, model]
      end
    end
  end
  puts "Named Entities: #{named_entities}"

# Loading specific models
# Just pass the name of the model file to the constructor. The gem will search for the file in the OpenNLP.model_path folder.

  OpenNLP.load

  tokenizer = OpenNLP::TokenizerME.new('en-token.bin')
  tagger = OpenNLP::POSTaggerME.new('en-pos-perceptron.bin')
  name_finder = OpenNLP::NameFinderME.new('en-ner-person.bin')
# etc.
  puts "Tokenizer: #{tokenizer}"
  puts "Tagger: #{tagger}"
  puts "Name Finder: #{name_finder}"

# Loading specific classes
# You may want to load specific classes from the OpenNLP library that are not loaded by default. The gem provides an API to do this:

# Default base class is opennlp.tools.
  OpenNLP.load_class('SomeClassName')
# => OpenNLP::SomeClassName

# Here, we specify another base class.
  OpenNLP.load_class('SomeOtherClass', 'opennlp.tools.namefind')
  # => OpenNLP::SomeOtherClass

end

失败的行是第 73 行:(令牌 == 正在处理的句子。)

  tags = tagger.tag(tokens).to_a  # 
# => %w[DT NN IN DT NN VBD VBN IN PRP$ NNS .]

tagger.tag 调用 open-nlp/classes.rb 第 13 行,这是引发错误的地方。那里的代码是:

class OpenNLP::POSTaggerME < OpenNLP::Base

  unless RUBY_PLATFORM =~ /java/
    def tag(*args)
      OpenNLP::Bindings::Utils.tagWithArrayList(@proxy_inst, args[0])  # <== Line 13
    end
  end

end

此时抛出的 Ruby 错误是:`method_missing': unknown exception (NullPointerException)。调试这个,我发现错误 java.lang.NullPointerException。args[0] 是正在处理的句子。@proxy_inst 是 opennlp.tools.postag.POSTaggerME@1b5080a。

OpenNLP::Bindings 设置 Java 环境。例如,它设置要加载的 Jars 以及这些 Jars 中的类。在第 54 行,它为 RJB 设置了默认值,它应该设置 OpenNLP::Bindings::Utils 及其方法,如下所示:

  # Add in Rjb workarounds.
  unless RUBY_PLATFORM =~ /java/
    self.default_jars << 'utils.jar'
    self.default_classes << ['Utils', '']
  end

utils.jar 和 Utils.java 位于 CLASSPATH 中,而其他 Jars 正在加载。它们正在被访问,这是经过验证的,因为如果其他 Jar 不存在,它们会抛出错误消息。类路径是:

.;C:\Program Files (x86)Java\jdk1.7.0_40\lib;C:\Program Files (x86)Java\jre7\lib;D:\BitNami\rubystack-1.9.3-12\ruby\lib\ruby\gems\1.9.1\gems\open-nlp-0.1.4\bin

应用程序罐子在 D:\BitNami\rubystack-1.9.3-12\ruby\lib\ruby\gems\1.9.1\gems\open-nlp-0.1.4\bin 中,如果它们不存在我在其他罐子上收到错误消息。...\bin 中的 Jars 和 Java 文件包括:

jwnl-1.3.3.jar
opennlp-maxent-3.0.2-incubating.jar
opennlp-tools-1.5.2-incubating.jar
opennlp-uima-1.5.2-incubating.jar
utils.jar
Utils.java

Utils.java如下:

import java.util.Arrays;
import java.util.ArrayList;
import java.lang.String;
import opennlp.tools.postag.POSTagger;
import opennlp.tools.chunker.ChunkerME;
import opennlp.tools.namefind.NameFinderME; // interface instead?
import opennlp.tools.util.Span;

// javac -cp '.:opennlp.tools.jar' Utils.java
// jar cf utils.jar Utils.class
public class Utils {

    public static String[] tagWithArrayList(POSTagger posTagger, ArrayList[] objectArray) {
      return posTagger.tag(getStringArray(objectArray));
    }
    public static Object[] findWithArrayList(NameFinderME nameFinder, ArrayList[] tokens) {
      return nameFinder.find(getStringArray(tokens));
    }
    public static Object[] chunkWithArrays(ChunkerME chunker, ArrayList[] tokens, ArrayList[] tags) {
      return chunker.chunk(getStringArray(tokens), getStringArray(tags));
    }
    public static String[] getStringArray(ArrayList[] objectArray) {
      String[] stringArray = Arrays.copyOf(objectArray, objectArray.length, String[].class);
          return stringArray;
    }
}

因此,它应该定义 tagWithArrayList 并导入 opennlp.tools.postag.POSTagger。(OBTW,只是为了尝试,我在这个文件中将 POSTagger 的发生率更改为 POSTaggerME。它什么也没改变......)

工具 Jar 文件 opennlp-tools-1.5.2-incubating.jar 包括 postag/POSTagger 和 POSTaggerME 类文件,正如预期的那样。

错误消息是:

D:\BitNami\rubystack-1.9.3-12\ruby\bin\ruby.exe -e $stdout.sync=true;$stderr.sync=true;load($0=ARGV.shift) D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb
.;C:\Program Files (x86)\Java\jdk1.7.0_40\lib;C:\Program Files (x86)\Java\jre7\lib;D:\BitNami\rubystack-1.9.3-12\ruby\lib\ruby\gems\1.9.1\gems\open-nlp-0.1.4\bin
D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/bin/
D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/bin/
Tokenize ["The", "death", "of", "the", "poet", "was", "kept", "from", "his", "poems", "."]
Tokenize ["The", "death", "of", "the", "poet", "was", "kept", "from", "his", "poems", "."]
D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp/classes.rb:13:in `method_missing': unknown exception (NullPointerException)
    from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp/classes.rb:13:in `tag'
    from D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:73:in `<class:OpennlpTryer>'
    from D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:1:in `<top (required)>'
    from -e:1:in `load'
    from -e:1:in `<main>'

修改后的 Utils.java:

import java.util.Arrays;
import java.util.Object;
import java.lang.String;
import opennlp.tools.postag.POSTagger;
import opennlp.tools.chunker.ChunkerME;
import opennlp.tools.namefind.NameFinderME; // interface instead?
import opennlp.tools.util.Span;

// javac -cp '.:opennlp.tools.jar' Utils.java
// jar cf utils.jar Utils.class
public class Utils {

    public static String[] tagWithArrayList(POSTagger posTagger, Object[] objectArray) {
      return posTagger.tag(getStringArray(objectArray));
    }f
    public static Object[] findWithArrayList(NameFinderME nameFinder, Object[] tokens) {
      return nameFinder.find(getStringArray(tokens));
    }
    public static Object[] chunkWithArrays(ChunkerME chunker, Object[] tokens, Object[] tags) {
      return chunker.chunk(getStringArray(tokens), getStringArray(tags));
    }
    public static String[] getStringArray(Object[] objectArray) {
      String[] stringArray = Arrays.copyOf(objectArray, objectArray.length, String[].class);
          return stringArray;
    }
}

修改错误信息:

Uncaught exception: uninitialized constant OpennlpTryer::ArrayStoreException
    D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:81:in `rescue in <class:OpennlpTryer>'
    D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:77:in `<class:OpennlpTryer>'
    D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:1:in `<top (required)>'

将 Utils.java 的错误修改为“import java.lang.Object;”:

Uncaught exception: uninitialized constant OpennlpTryer::ArrayStoreException
    D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:81:in `rescue in <class:OpennlpTryer>'
    D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:77:in `<class:OpennlpTryer>'
    D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:1:in `<top (required)>'

从 OpennlpTryer 中删除的救援显示 classes.rb 中的错误:

Uncaught exception: uninitialized constant OpenNLP::POSTaggerME::ArrayStoreException
    D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp/classes.rb:16:in `rescue in tag'
    D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp/classes.rb:13:in `tag'
    D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:78:in `<class:OpennlpTryer>'
    D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:1:in `<top (required)>'

同样的错误,但所有救援都被删除了,所以它是“原生 Ruby”

Uncaught exception: unknown exception
    D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp/classes.rb:15:in `method_missing'
    D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp/classes.rb:15:in `tag'
    D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:78:in `<class:OpennlpTryer>'
    D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:1:in `<top (required)>'

修改后的 Utils.java:

import java.util.Arrays;
import java.util.ArrayList;
import java.lang.String;
import opennlp.tools.postag.POSTagger;
import opennlp.tools.chunker.ChunkerME;
import opennlp.tools.namefind.NameFinderME; // interface instead?
import opennlp.tools.util.Span;

// javac -cp '.:opennlp.tools.jar' Utils.java
// jar cf utils.jar Utils.class
public class Utils {

    public static String[] tagWithArrayList(
      System.out.println("Tokens: ("+objectArray.getClass().getSimpleName()+"): \n"+objectArray);
      POSTagger posTagger, ArrayList[] objectArray) {
      return posTagger.tag(getStringArray(objectArray));
    }
    public static Object[] findWithArrayList(NameFinderME nameFinder, ArrayList[] tokens) {
      return nameFinder.find(getStringArray(tokens));
    }
    public static Object[] chunkWithArrays(ChunkerME chunker, ArrayList[] tokens, ArrayList[] tags) {
      return chunker.chunk(getStringArray(tokens), getStringArray(tags));
    }
    public static String[] getStringArray(ArrayList[] objectArray) {
      String[] stringArray = Arrays.copyOf(objectArray, objectArray.length, String[].class);
          return stringArray;
    }
}

我在从 util.jar 解压缩的 Utils.class 上运行 cavaj,这就是我发现的。它与 Utils.java 有很大不同。两者都安装了 open-nlp 1.4.8 gem。我不知道这是否是问题的根本原因,但这个文件是它破坏的核心,我们有一个重大差异。我们应该使用哪个?

import java.util.ArrayList;
import java.util.Arrays;
import opennlp.tools.chunker.ChunkerME;
import opennlp.tools.namefind.NameFinderME;
import opennlp.tools.postag.POSTagger;

public class Utils
{

    public Utils()
    {
    }

    public static String[] tagWithArrayList(POSTagger postagger, ArrayList aarraylist[])
    {
        return postagger.tag(getStringArray(aarraylist));
    }

    public static Object[] findWithArrayList(NameFinderME namefinderme, ArrayList aarraylist[])
    {
        return namefinderme.find(getStringArray(aarraylist));
    }

    public static Object[] chunkWithArrays(ChunkerME chunkerme, ArrayList aarraylist[], ArrayList aarraylist1[])
    {
        return chunkerme.chunk(getStringArray(aarraylist), getStringArray(aarraylist1));
    }

    public static String[] getStringArray(ArrayList aarraylist[])
    {
        String as[] = (String[])Arrays.copyOf(aarraylist, aarraylist.length, [Ljava/lang/String;);
        return as;
    }
}

从 10/07 开始使用的 Utils.java,编译并压缩到 utils.jar 中:

import java.util.Arrays;
import java.util.ArrayList;
import java.lang.String;
import opennlp.tools.postag.POSTagger;
import opennlp.tools.chunker.ChunkerME;
import opennlp.tools.namefind.NameFinderME; // interface instead?
import opennlp.tools.util.Span;

// javac -cp '.:opennlp.tools.jar' Utils.java
// jar cf utils.jar Utils.class
public class Utils {

    public static String[] tagWithArrayList(POSTagger posTagger, ArrayList[] objectArray) {
      return posTagger.tag(getStringArray(objectArray));
    }
    public static Object[] findWithArrayList(NameFinderME nameFinder, ArrayList[] tokens) {
      return nameFinder.find(getStringArray(tokens));
    }
    public static Object[] chunkWithArrays(ChunkerME chunker, ArrayList[] tokens, ArrayList[] tags) {
      return chunker.chunk(getStringArray(tokens), getStringArray(tags));
    }
    public static String[] getStringArray(ArrayList[] objectArray) {
      String[] stringArray = Arrays.copyOf(objectArray, objectArray.length, String[].class);
          return stringArray;
    }
}

此处第 110 行的 BindIt::Binding::load_klass 发生故障:

# Private function to load classes.
# Doesn't check if initialized.
def load_klass(klass, base, name=nil)
  base += '.' unless base == ''
  fqcn = "#{base}#{klass}"
  name ||= klass
  if RUBY_PLATFORM =~ /java/
    rb_class = java_import(fqcn)
    if name != klass
      if rb_class.is_a?(Array)
        rb_class = rb_class.first
      end
      const_set(name.intern, rb_class)
    end
  else
    rb_class = Rjb::import(fqcn)             # <== This is line 110
    const_set(name.intern, rb_class)
  end
end

消息如下,但是它们在识别的特定方法方面不一致。每次运行都可能显示不同的方法,POSTagger、ChunkerME 或 NameFinderME 中的任何一种。

D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/bind-it-0.2.7/lib/bind-it/binding.rb:110:in `import': opennlp/tools/namefind/NameFinderME (NoClassDefFoundError)
    from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/bind-it-0.2.7/lib/bind-it/binding.rb:110:in `load_klass'
    from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/bind-it-0.2.7/lib/bind-it/binding.rb:89:in `block in load_default_classes'
    from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/bind-it-0.2.7/lib/bind-it/binding.rb:87:in `each'
    from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/bind-it-0.2.7/lib/bind-it/binding.rb:87:in `load_default_classes'
    from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/bind-it-0.2.7/lib/bind-it/binding.rb:56:in `bind'
    from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp.rb:14:in `load'
    from D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:54:in `<class:OpennlpTryer>'
    from D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:1:in `<top (required)>'
    from -e:1:in `load'
    from -e:1:in `<main>'

这些错误的有趣之处在于它们源自 OpennlpTryer 第 54 行,即:

  OpenNLP.load

此时,OpenNLP 启动 RJB,它使用 BindIt 加载 jar 和类。这远远早于我在这个问题开始时看到的错误。但是,我不禁认为这都是相关的。我真的完全不明白这些错误的不一致之处。

我能够将日志记录功能添加到 Utils.java,在添加“import java.io.*”后编译它并压缩它。但是,由于这些错误,我将其撤出,因为我不知道它是否涉及。我不认为是。但是,由于这些错误是在加载过程中发生的,因此无论如何都不会调用该方法,因此在那里记录将无济于事......

对于其他每个 jar,先加载 jar,然后使用 RJB 导入每个类。Utils 的处理方式不同,并被指定为“默认值”。据我所知,执行 Utils.class 是为了加载自己的类?

10/07 稍后更新:

我想,这就是我所在的地方。首先,正如我今天早些时候所描述的,我在替换 Utils.java 时遇到了一些问题。在我安装修复程序之前,可能需要解决这个问题。

其次,我现在了解 POSTagger 和 POSTaggerME 之间的区别,因为 ME 表示最大熵。测试代码正在尝试调用 POSTaggerME,但在我看来,它像 Utils.java 一样,在实现时支持 POSTagger。我尝试更改测试代码以调用 POSTagger,但它说找不到初始化程序。查看每一个的来源,我在这里猜测,我认为 POSTagger 存在的唯一目的是支持实现它的 POSTaggerME。

源是opennlp-tools文件 opennlp-tools-1.5.2-incubating-sources.jar。

首先我没有得到 Utils 的全部原因?为什么 bindings.rb 中提供的 jars/classes 不够?这感觉就像一个糟糕的猴子补丁。我的意思是,首先看看 bindings.rb 做了什么:

  # Default JARs to load.
  self.default_jars = [
    'jwnl-1.3.3.jar',
    'opennlp-tools-1.5.2-incubating.jar',
    'opennlp-maxent-3.0.2-incubating.jar',
    'opennlp-uima-1.5.2-incubating.jar'
  ]

  # Default namespace.
  self.default_namespace = 'opennlp.tools'

  # Default classes.
  self.default_classes = [
    # OpenNLP classes.
    ['AbstractBottomUpParser', 'opennlp.tools.parser'],
    ['DocumentCategorizerME', 'opennlp.tools.doccat'],
    ['ChunkerME', 'opennlp.tools.chunker'],
    ['DictionaryDetokenizer', 'opennlp.tools.tokenize'],
    ['NameFinderME', 'opennlp.tools.namefind'],
    ['Parser', 'opennlp.tools.parser.chunking'],
    ['Parse', 'opennlp.tools.parser'],
    ['ParserFactory', 'opennlp.tools.parser'],
    ['POSTaggerME', 'opennlp.tools.postag'],
    ['SentenceDetectorME', 'opennlp.tools.sentdetect'],
    ['SimpleTokenizer', 'opennlp.tools.tokenize'],
    ['Span', 'opennlp.tools.util'],
    ['TokenizerME', 'opennlp.tools.tokenize'],

    # Generic Java classes.
    ['FileInputStream', 'java.io'],
    ['String', 'java.lang'],
    ['ArrayList', 'java.util']
  ]

  # Add in Rjb workarounds.
  unless RUBY_PLATFORM =~ /java/
    self.default_jars << 'utils.jar'
    self.default_classes << ['Utils', '']
  end
4

2 回答 2

3

我不认为你做错了什么。你也不是唯一一个有这个问题的人。它看起来像Utils. 在 Java 中创建一个ArrayList[]没有多大意义——它在技术上是合法的,但它会是一个ArrayLists 的数组,这 a) 很奇怪,b) 关于 Java 泛型的糟糕实践,c) 不会正确转换喜欢String[]作者有意中getStringArray()

考虑到实用程序的编写方式以及 OpenNLP 实际上确实希望接收 aString[]作为其tag()方法的输入这一事实,我最好的猜测是原作者的意思是Object[]他们ArrayList[]Utils课堂上的位置。

更新

要输出到项目目录根目录中的文件,请尝试像这样调整日志记录(我添加了另一行用于打印输入数组的内容):

try {
    File log = new File("log.txt");
    FileWriter fileWriter = new FileWriter(log);
    BufferedWriter bufferedWriter = new BufferedWriter(fileWriter);
    bufferedWriter.write("Tokens ("+objectArray.getClass().getSimpleName()+"): \r\n"+objectArray.toString()+"\r\n");
    bufferedWriter.write(Arrays.toString(objectArray));
    bufferedWriter.close(); 
}
catch (Exception e) {
    e.printStackTrace();
}
于 2013-10-04T20:50:31.107 回答
3

完整更正的 CLASSES.RB 模块请参见最后的完整代码

我今天遇到了同样的问题。我不太明白为什么要使用 Utils 类,所以我通过以下方式修改了 classes.rb 文件:

unless RUBY_PLATFORM =~ /java/
  def tag(*args)
    @proxy_inst.tag(args[0])
    #OpenNLP::Bindings::Utils.tagWithArrayList(@proxy_inst, args[0])
  end
end

这样我就可以通过以下测试:

sent   = "The death of the poet was kept from his poems."
tokens = tokenizer.tokenize(sent).to_a
# => %w[The death of the poet was kept from his poems .]
tags   = tagger.tag(tokens).to_a
# => ["prop", "prp", "n", "v-fin", "n", "adj", "prop", "v-fin", "n", "adj", "punc"]

R_G 编辑: 我测试了该更改并消除了错误。我将不得不做更多的测试,以确保结果是预期的。但是,按照相同的模式,我也在 classes.rb 中进行了以下更改:

def chunk(tokens, tags)
  chunks = @proxy_inst.chunk(tokens, tags)
  # chunks = OpenNLP::Bindings::Utils.chunkWithArrays(@proxy_inst, tokens,tags)
  chunks.map { |c| c.to_s }
end

...

class OpenNLP::NameFinderME < OpenNLP::Base
  unless RUBY_PLATFORM =~ /java/
    def find(*args)
      @proxy_inst.find(args[0])
      # OpenNLP::Bindings::Utils.findWithArrayList(@proxy_inst, args[0])
    end
  end
end

这使得整个样本测试能够顺利执行。我将在稍后提供有关结果验证的更新。

每个 Space Pope 和 R_G 的最终编辑和更新 CLASSES.RB:

事实证明,这个答案是所需解决方案的关键。然而,结果不一致,因为它被纠正了。根据 RJB 的规定,我们继续深入研究并在通话期间实施强类型。这会将调用转换为使用 _invoke 方法,其中参数包括所需的方法、强类型和附加参数。安德烈的建议是解决问题的关键,因此对他表示敬意。这是完整的模块。它消除了对试图进行这些调用但失败的 Utils.class 的需要。我们计划为 open-nlp gem 发出 github 拉取请求以更新此模块:

require 'open-nlp/base'

class OpenNLP::SentenceDetectorME < OpenNLP::Base; end

class OpenNLP::SimpleTokenizer < OpenNLP::Base; end

class OpenNLP::TokenizerME < OpenNLP::Base; end

class OpenNLP::POSTaggerME < OpenNLP::Base

  unless RUBY_PLATFORM =~ /java/
    def tag(*args)
        @proxy_inst._invoke("tag", "[Ljava.lang.String;", args[0])
    end

  end
end


class OpenNLP::ChunkerME < OpenNLP::Base

  if RUBY_PLATFORM =~ /java/

    def chunk(tokens, tags)
      if !tokens.is_a?(Array)
        tokens = tokens.to_a
        tags = tags.to_a
      end
      tokens = tokens.to_java(:String)
      tags = tags.to_java(:String)
      @proxy_inst.chunk(tokens,tags).to_a
    end

  else

    def chunk(tokens, tags)
      chunks = @proxy_inst._invoke("chunk", "[Ljava.lang.String;[Ljava.lang.String;", tokens, tags)
      chunks.map { |c| c.to_s }
    end

  end

end

class OpenNLP::Parser < OpenNLP::Base

  def parse(text)

    tokenizer = OpenNLP::TokenizerME.new
    full_span = OpenNLP::Bindings::Span.new(0, text.size)

    parse_obj = OpenNLP::Bindings::Parse.new(
    text, full_span, "INC", 1, 0)

    tokens = tokenizer.tokenize_pos(text)

    tokens.each_with_index do |tok,i|
      start, stop = tok.get_start, tok.get_end
      token = text[start..stop-1]
      span = OpenNLP::Bindings::Span.new(start, stop)
      parse = OpenNLP::Bindings::Parse.new(text, span, "TK", 0, i)
      parse_obj.insert(parse)
    end

    @proxy_inst.parse(parse_obj)

  end

end

class OpenNLP::NameFinderME < OpenNLP::Base
  unless RUBY_PLATFORM =~ /java/
    def find(*args)
      @proxy_inst._invoke("find", "[Ljava.lang.String;", args[0])
    end
  end
end
于 2013-10-08T16:08:07.697 回答