1

我想在给定字符串中搜索子字符串。每次子字符串包含在输入的字符串中时,我都会将其附加到数组中。最终,我希望tally该数组计算每个子字符串出现的次数。

问题是我的代码中字典中的子字符串只添加一次到new_array.

例如:

dictionary = ["below", "down","go","going","horn","how","howdy","it","i","low","own","part","partner","sit"]

substrings("go going", dictionary)

应该输出:

{"go"=>2, "going"=>1, "i"=>1}

但我明白了

{"go"=>1, "going"=>1, "i"=>1}

这是我的代码:

def substrings(word, array) 

  new_array = []

  array.each do |index| 

    if word.downcase.include? (index)

      new_array << index

    end
  end

  puts new_array.tally

end

 dictionary = ["below", "down","go","going","horn","how","howdy","it","i","low","own","part","partner","sit"]

 substrings("go going", dictionary)
4

7 回答 7

1

取决于你的字典有多大。

当单词中存在子字符串时,您可以将所有元素与其出现次数进行映射。

dictionary.map {|w| [w,word.scan(w).size] if word.include?(w)}.compact.to_h
于 2020-06-17T17:01:40.210 回答
0

如果我的理解是我们得到一个dictionary不包含空格的单词数组和一个 string str,并且将构造一个哈希,其键是元素,dictionary其值等于非重叠1的子串的str数量,其中键是子串。返回的散列应该排除具有零值的键。

该答案解决了以下情况:

substrings(str, dictionary)

dictionary大,str不过大(我稍后会详细说明其含义),效率很重要。

我们首先定义一个辅助方法,其目的将变得清晰。

def substr_counts(str)
  str.split.each_with_object(Hash.new(0)) do |word,h|
    (1..word.size).each do |sub_len|
      (0..word.size-sub_len).each do |start_idx|
        h[word[start_idx,sub_len]] += 1
      end
    end
  end
end       

对于问题中给出的示例,

substr_counts("go going")
  #=> {"g"=>3, "o"=>2, "go"=>2, "i"=>1, "n"=>1, "oi"=>1, "in"=>1, "ng"=>1,
  #    "goi"=>1, "oin"=>1, "ing"=>1, "goin"=>1, "oing"=>1, "going"=>1}

正如所见,此方法分解str为单词,计算每个单词的每个子字符串并返回一个哈希,其键是子字符串,其值是包含该子字符串的所有单词中不重叠子字符串的总数。

现在可以快速构建所需的哈希。

def cover_count(str, dictionary)
  h = substr_counts(str)
  dictionary.each_with_object({}) do |word,g|
    g[word] = h[word] if h.key?(word)
  end
end

dictionary = ["below", "down", "go", "going", "horn", "how", "howdy", 
              "it", "i", "low", "own", "part", "partner", "sit"]

cover_count("go going", dictionary)
  #=> {"go"=>2, "going"=>1, "i"=>1}

另一个例子:

str = "lowner partnership lownliest"
cover_count(str, dictionary)
  #=> {"i"=>2, "low"=>2, "own"=>2, "part"=>1, "partner"=>1}     

这里,

substr_counts(str)
  #=> {"l"=>3, "o"=>2, "w"=>2, "n"=>3, "e"=>3, "r"=>3, "lo"=>2,
  #    ...
  #    "wnliest"=>1, "lownlies"=>1, "ownliest"=>1, "lownliest"=>1} 
substr_counts(str).size
  #=> 109

这里有一个明显的权衡。如果str是长的,特别是如果它包含长词2,构建将花费太长时间来h证明不必为 中的每个词确定dictionary该词是否包含在 的每个词中的节省是合理的str。但是,如果构建 是值得的h,那么总体上节省的时间可能是可观的。

1.“不重叠”我的意思是如果str等于'bobobo'它包含一个,而不是两个子字符串'bobo'

2.substr_counts("antidisestablishmentarianism").size #=> 385还不错。

于 2020-06-17T20:10:43.527 回答
0

只有字典中的“go”、“going”和“i”是短语的子字符串。这些词中的每一个在字典中只出现一次。那么究竟new_array包含哪一个。["go", "going", "i"]{"go"=>1, "going"=>1, "i"=>1}

我假设您预计go会出现两次,因为在您的短语中出现了两次。在这种情况下,您可以将方法更改为

def substrings(word, array) 
  new_array = []
  array.each do |index| 
    word.scan(/#{index}/).each { new_array << index }
  end
  puts new_array.tally
end

word.scan(/#{index}/)返回短语中每次出现的子字符串。

于 2020-06-17T16:51:43.917 回答
0

其他选项是在拆分单词后使用Array#product ,因此您可以根据需要使用Enumerable#Tally

word = "go going"
word.split.product(dictionary).select { |a, b| a.include? b }.map(&:last).tally

#=> {"go"=>2, "going"=>1, "i"=>1}

时输出不同word = "gogoing",因为它被拆分为一个元素数组。所以,我不能说这是否是你正在寻找的行为。

于 2020-06-17T20:18:52.480 回答
0

您必须计算字符串出现在索引中的次数,因此请使用scan

def substrings(word, array) 

  hash = {}

  array.each do |index| 
    if word.downcase.include? (index)
      new_hash = {index => word.scan(/#{index}/).length}; 
      hash.merge!(new_hash) 
    end
  end

  puts hash 

end
于 2020-06-17T16:59:04.163 回答
0

您可以使用scan来计算每个子字符串出现的次数。

def substrings(word, array)
  output = {}
  array.each do |index|
     count_substring_appears = word.scan(index).size
     if count_substring_appears > 0
       output[index] = count_substring_appears
     end
  end

  output
end
于 2020-06-17T16:50:44.893 回答
0

我会从这个开始:

dictionary = %w[down go going it i]
target = 'go going'

dictionary.flat_map { |w|
  target.scan(Regexp.new(w, Regexp::IGNORECASE))
}.reject(&:empty?).tally
# => {"go"=>2, "going"=>1, "i"=>1}
于 2020-06-18T00:18:16.653 回答