5

我想转换由PTB 风格的分词器生成的词素数组:

["The", "house", "is", "n't", "on", "fire", "."]

给一句话:

"The house isn't on fire."

什么是实现这一目标的明智方法?

4

1 回答 1

2

如果我们接受@sawa 对撇号的建议并将您的数组设为:

["The", "house", "isn't", "on", "fire", "."]

您可以通过以下方式获得所需的内容(带有标点符号支持!):

def sentence(array)
  str = ""
  array.each_with_index do |w, i|
    case w
    when '.', '!', '?' #Sentence enders, inserts a space too if there are more words.
      str << w
      str << ' ' unless(i == array.length-1)
    when ',', ';' #Inline separators
      str << w
      str << ' '
    when '--' #Dash
      str << ' -- '
    else #It's a word
      str << ' ' unless str[-1] == ' ' || str.length == 0
      str << w
    end
  end
  str
end
于 2013-03-29T21:50:34.647 回答