arrays - 算法找到最常出现的长度为 3 的字符串序列

Question

“给定多个名称数组，找到最常出现的长度为 3 的名称序列（长度为 3 的序列），如果存在的话”

例如：给定 3 个名称数组：

Ana John Maria
Paul
Sharon Ana John Maria Tiffany Ted

输出将是Ana John Maria因为这个序列在第一个和第三个数组中遇到了两次。

我似乎无法为此找到正确的解决方案。

谁能指出我正确的方向？也许这是一个众所周知的算法。谁能给我一个链接？谢谢

score 4 · Accepted Answer

将数组合并成类似于 trie 的树，其中每个节点不是单个字母，而是一个全名。这应该允许您更轻松地查找和计算子序列。事实上，我强烈怀疑您可以查找此任务的标准算法。

更新：查看使用后缀树的算法：http ://en.wikipedia.org/wiki/Suffix_tree

score 2 · Accepted Answer

一种简单的方法是获取 3 的序列并将它们放入HashTable. 一旦你遇到一个 3 的序列，你就会增加相应的出现计数器。最后只返回最频繁的出现/序列。这是通过扫描HashTable具有最大出现值的条目来找到的。Java 中的示例：

public class Sequence {  
     public List<String> sequenceOfThree(List<List<String>> names){
          Map<List<String>, Integer> map = new HashMap<List<String>, Integer>();  
          for(List<String> nameList:names){  
              int startIdx = 0;
              int endIdx = 3;
              while(endIdx <= nameList.size()){  
                   List<String> subsequence = nameList.subList(startIdx, endIdx);  
                   //add to map  
                   Integer count = map.get(subsequence);  
                   if(count == null){  
                         count = 0;  
                   }  
                   map.put(subsequence, count + 1);  
                   startIdx++;  
                   endIdx++;  
              }  
          }  
          Integer max = Integer.MIN_VALUE;  
          List<String> result = Collections.emptyList();  
          for(Entry<List<String>, Integer> entries:map.entrySet()){  
              if(entries.getValue() > max){  
                  max = entries.getValue();  
                  result = entries.getKey();  
          }
      }  
      return result;  
  }  
  /**  
   * @param args  
  */  
   public static void main(String[] args) {  
         List<List<String>> names = new ArrayList<List<String>>();  
         names.add(Arrays.asList(new String[]{"Ana", "John", "Maria"}));  
         names.add(Arrays.asList(new String[]{"Paul"}));  
         names.add(Arrays.asList(new String[]  
{"Sharon", "Ana", "John", "Maria", "Tiffany" ,"Ted"}));  
        System.out.println(new Sequence().sequenceOfThree(names));  
   }  
}

arrays - 算法找到最常出现的长度为 3 的字符串序列

2 回答 2

Related

Reference