java - 从 Trie 中获取单词列表

Question

我希望使用以下代码不检查 Trie 中是否存在匹配的单词，而是返回以用户输入的前缀开头的所有单词的列表。有人可以指出我正确的方向吗？我根本无法让它工作......

public boolean search(String s)
{
    Node current = root;
    System.out.println("\nSearching for string: "+s);

    while(current != null)
    {
        for(int i=0;i<s.length();i++)
        {               
            if(current.child[(int)(s.charAt(i)-'a')] == null)
            {
                System.out.println("Cannot find string: "+s);
                return false;
            }
            else
            {
                current = current.child[(int)(s.charAt(i)-'a')];
                System.out.println("Found character: "+ current.content);
            }
        }
        // If we are here, the string exists.
        // But to ensure unwanted substrings are not found:

        if (current.marker == true)
        {
            System.out.println("Found string: "+s);
            return true;
        }
        else
        {
            System.out.println("Cannot find string: "+s +"(only present as a substring)");
            return false;
        }
    }

    return false; 
}

}

score 10 · Accepted Answer

我在尝试制作文本自动完成模块时遇到了这个问题。我通过制作一个 Trie 解决了这个问题，其中每个节点都包含它的父节点和子节点。首先，我从输入前缀开始搜索节点。然后我在 Trie 上应用了一个遍历，它以根作为前缀节点探索子树的所有节点。每当遇到叶节点时，就意味着找到了从输入前缀开始的单词的结尾。从那个叶节点开始，我遍历获得父节点的父节点，并到达子树的根。在这样做的同时，我一直在堆栈中添加节点的键。最后，我采用了前缀并开始通过弹出堆栈来附加它。我继续将单词保存在 ArrayList 中。在遍历结束时，我得到了从输入前缀开始的所有单词。

class TrieNode
{
    char c;
    TrieNode parent;
    HashMap<Character, TrieNode> children = new HashMap<Character, TrieNode>();
    boolean isLeaf;

    public TrieNode() {}
    public TrieNode(char c){this.c = c;}
}

-

public class Trie
{
    private TrieNode root;
    ArrayList<String> words; 
    TrieNode prefixRoot;
    String curPrefix;

    public Trie()
    {
        root = new TrieNode();
        words  = new ArrayList<String>();
    }

    // Inserts a word into the trie.
    public void insert(String word) 
    {
        HashMap<Character, TrieNode> children = root.children;

        TrieNode crntparent;

        crntparent = root;

        //cur children parent = root

        for(int i=0; i<word.length(); i++)
        {
            char c = word.charAt(i);

            TrieNode t;
            if(children.containsKey(c)){ t = children.get(c);}
            else
            {
            t = new TrieNode(c);
            t.parent = crntparent;
            children.put(c, t);
            }

            children = t.children;
            crntparent = t;

            //set leaf node
            if(i==word.length()-1)
                t.isLeaf = true;    
        }
    }

    // Returns if the word is in the trie.
    public boolean search(String word)
    {
        TrieNode t = searchNode(word);
        if(t != null && t.isLeaf){return true;}
        else{return false;}
    }

    // Returns if there is any word in the trie
    // that starts with the given prefix.
    public boolean startsWith(String prefix) 
    {
        if(searchNode(prefix) == null) {return false;}
        else{return true;}
    }

    public TrieNode searchNode(String str)
    {
        Map<Character, TrieNode> children = root.children; 
        TrieNode t = null;
        for(int i=0; i<str.length(); i++)
        {
            char c = str.charAt(i);
            if(children.containsKey(c))
            {
                t = children.get(c);
                children = t.children;
            }
            else{return null;}
        }

        prefixRoot = t;
        curPrefix = str;
        words.clear();
        return t;
    }


    ///////////////////////////


  void wordsFinderTraversal(TrieNode node, int offset) 
  {
        //  print(node, offset);

        if(node.isLeaf==true)
        {
          //println("leaf node found");

          TrieNode altair;
          altair = node;

          Stack<String> hstack = new Stack<String>(); 

          while(altair != prefixRoot)
          {
            //println(altair.c);
            hstack.push( Character.toString(altair.c) );
            altair = altair.parent;
          }

          String wrd = curPrefix;

          while(hstack.empty()==false)
          {
            wrd = wrd + hstack.pop();
          }

          //println(wrd);
          words.add(wrd);

        }

         Set<Character> kset = node.children.keySet();
         //println(node.c); println(node.isLeaf);println(kset);
         Iterator itr = kset.iterator();
         ArrayList<Character> aloc = new ArrayList<Character>();

       while(itr.hasNext())
       {
        Character ch = (Character)itr.next();  
        aloc.add(ch);
        //println(ch);
       } 

     // here you can play with the order of the children

       for( int i=0;i<aloc.size();i++)
       {
        wordsFinderTraversal(node.children.get(aloc.get(i)), offset + 2);
       } 

  }


 void displayFoundWords()
 {
   println("_______________");
  for(int i=0;i<words.size();i++)
  {
    println(words.get(i));
  } 
  println("________________");

 }



}//

例子

Trie prefixTree;

prefixTree = new Trie();  

  prefixTree.insert("GOING");
  prefixTree.insert("GONG");
  prefixTree.insert("PAKISTAN");
  prefixTree.insert("SHANGHAI");
  prefixTree.insert("GONDAL");
  prefixTree.insert("GODAY");
  prefixTree.insert("GODZILLA");

  if( prefixTree.startsWith("GO")==true)
  {
    TrieNode tn = prefixTree.searchNode("GO");
    prefixTree.wordsFinderTraversal(tn,0);
    prefixTree.displayFoundWords(); 

  }

  if( prefixTree.startsWith("GOD")==true)
  {
    TrieNode tn = prefixTree.searchNode("GOD");
    prefixTree.wordsFinderTraversal(tn,0);
    prefixTree.displayFoundWords(); 

  }

score 7 · Accepted Answer

构建 Trie 后，您可以从找到前缀的节点开始执行 DFS：

Here Node is Trie node, word=till now found word, res = list of words

def dfs(self, node, word, res):
    # Base condition: when at leaf node, add current word into our list
    if EndofWord at node: 
        res.append(word)
        return
    # For each level, go deep down, but DFS fashion 
    # add current char into our current word.
    for w in node:
        self.dfs(node[w], word + w, res)

score 6 · Accepted Answer

最简单的解决方案是使用深度优先搜索。

你沿着特里树往下走，从输入中逐个字母匹配。然后，一旦您没有更多要匹配的字母，该节点下的所有内容都是您想要的字符串。递归地探索整个 subtrie，在你下到它的节点时构建字符串。

score 1 · Accepted Answer

在我看来，这更容易递归解决。它会是这样的：

编写一个递归函数Print，打印以您作为参数提供的节点为根的 trie 中的所有节点。Wiki告诉您如何执行此操作（查看排序）。
找到前缀的最后一个字符，以及标有该字符的节点，从 trie 的根开始向下。Print以该节点为参数调用函数。然后只需确保您还在每个单词之前输出前缀，因为这将为您提供所有没有前缀的单词。

如果你真的不关心效率，你可以只Print用主根节点运行，只打印那些以你感兴趣的前缀开头的单词。这更容易实现但速度较慢。

score 1 · Accepted Answer

您需要从为前缀找到的节点开始遍历子树。

以同样的方式开始，即找到正确的节点。然后，不是检查其标记，而是遍历该树（即遍历其所有后代；DFS是一种很好的方法），保存用于从第一个节点到达“当前”节点的子字符串。

如果当前节点被标记为单词，则输出*到达的前缀+子字符串。

* 或将其添加到列表或其他内容中。

score 1 · Accepted Answer

我曾经为一个ITA谜题建立了一个 trie

public class WordTree {


class Node {

    private final char ch;

    /**
     * Flag indicates that this node is the end of the string.
     */
    private boolean end;

    private LinkedList<Node> children;

    public Node(char ch) {
        this.ch = ch;
    }

    public void addChild(Node node) {
        if (children == null) {
            children = new LinkedList<Node>();
        }
        children.add(node);
    }

    public Node getNode(char ch) {
        if (children == null) {
            return null;
        }
        for (Node child : children) {
            if (child.getChar() == ch) {
                return child;
            }
        }
        return null;
    }

    public char getChar() {
        return ch;
    }

    public List<Node> getChildren() {
        if (this.children == null) {
            return Collections.emptyList();
        }
        return children;
    }

    public boolean isEnd() {
        return end;
    }

    public void setEnd(boolean end) {
        this.end = end;
    }
}


Node root = new Node(' ');

public WordTree() {
}

/**
 * Searches for a strings that match the prefix.
 *
 * @param prefix - prefix
 * @return - list of strings that match the prefix, or empty list of no matches are found.
 */
public List<String> getWordsForPrefix(String prefix) {
    if (prefix.length() == 0) {
        return Collections.emptyList();
    }
    Node node = getNodeForPrefix(root, prefix);
    if (node == null) {
        return Collections.emptyList();
    }
    List<LinkedList<Character>> chars = collectChars(node);
    List<String> words = new ArrayList<String>(chars.size());
    for (LinkedList<Character> charList : chars) {
        words.add(combine(prefix.substring(0, prefix.length() - 1), charList));
    }
    return words;
}


private String combine(String prefix, List<Character> charList) {
    StringBuilder sb = new StringBuilder(prefix);
    for (Character character : charList) {
        sb.append(character);
    }
    return sb.toString();
}


private Node getNodeForPrefix(Node node, String prefix) {
    if (prefix.length() == 0) {
        return node;
    }
    Node next = node.getNode(prefix.charAt(0));
    if (next == null) {
        return null;
    }
    return getNodeForPrefix(next, prefix.substring(1, prefix.length()));
}


private List<LinkedList<Character>> collectChars(Node node) {
    List<LinkedList<Character>> chars = new ArrayList<LinkedList<Character>>();

    if (node.getChildren().size() == 0) {
        chars.add(new LinkedList<Character>(Collections.singletonList(node.getChar())));
    } else {
        if (node.isEnd()) {
            chars.add(new LinkedList<Character> 
            Collections.singletonList(node.getChar())));
        }
        List<Node> children = node.getChildren();
        for (Node child : children) {
            List<LinkedList<Character>> childList = collectChars(child);
            for (LinkedList<Character> characters : childList) {
                characters.push(node.getChar());
                chars.add(characters);
            }
        }
    }
    return chars;
}


public void addWord(String word) {
    addWord(root, word);
}

private void addWord(Node parent, String word) {
    if (word.trim().length() == 0) {
        return;
    }
    Node child = parent.getNode(word.charAt(0));
    if (child == null) {
        child = new Node(word.charAt(0));
        parent.addChild(child);
    } if (word.length() == 1) {
        child.setEnd(true);
    } else {
        addWord(child, word.substring(1, word.length()));
    }
}


public static void main(String[] args) {
    WordTree tree = new WordTree();
    tree.addWord("world");
    tree.addWord("work");
    tree.addWord("wolf");
    tree.addWord("life");
    tree.addWord("love");
    System.out.println(tree.getWordsForPrefix("wo"));
}

}

score 0 · Accepted Answer

您需要使用列表
List<String> myList = new ArrayList<String>(); if(matchingStringFound) myList.add(stringToAdd);

score 0 · Accepted Answer

在你的 for 循环之后，添加对 printAllStringsInTrie(current, s); 的调用

void printAllStringsInTrie(Node t, String prefix) {
  if (t.current_marker) System.out.println(prefix);
  for (int i = 0; i < t.child.length; i++) {
    if (t.child[i] != null) {
      printAllStringsInTrie(t.child[i], prefix + ('a' + i));  // does + work on (String, char)?
    }
  }
}

score 0 · Accepted Answer

简单的递归 DFS 算法可用于查找给定前缀的所有单词。

示例 Trie 节点：

static class TrieNode {
    Map<Character, TrieNode> children = new HashMap<>();
    boolean isWord = false;
}

查找给定前缀的所有单词的方法：

static List<String> findAllWordsForPrefix(String prefix, TrieNode root) {
    List<String> words = new ArrayList<>();
    TrieNode current = root;
    for(Character c: prefix.toCharArray()) {
        TrieNode nextNode = current.children.get(c);
        if(nextNode == null) return words;
        current = nextNode;
    }
    if(!current.children.isEmpty()) {
        findAllWordsForPrefixRecursively(prefix, current, words);
    } else {
        if(current.isWord) words.add(prefix);
    }
    return words;
}

static void findAllWordsForPrefixRecursively(String prefix, TrieNode node, List<String> words) {
    if(node.isWord) words.add(prefix);
    if(node.children.isEmpty()) {
        return;
    }
    for(Character c: node.children.keySet()) {
        findAllWordsForPrefixRecursively(prefix + c, node.children.get(c), words);
    }
}

完整的代码可以在下面找到： TrieDataStructure 示例

score 0 · Accepted Answer

下面的递归代码可以用在你的 TrieNode 是这样的地方：这个代码工作正常。

TrieNode(char c)
{

        this.con=c;
        this.isEnd=false;
        list=new ArrayList<TrieNode>();
        count=0;

}

//--------------------------------------------------

public void Print(TrieNode root1, ArrayList<Character> path)
{

      if(root1==null)
          return;

      if(root1.isEnd==true)
      {
          //print the entire path
          ListIterator<Character> itr1=path.listIterator();
          while(itr1.hasNext())
          {
              System.out.print(itr1.next());
          }
          System.out.println();
          return;
      }
      else{
          ListIterator<TrieNode> itr=root1.list.listIterator();
          while(itr.hasNext())
          {
              TrieNode child=itr.next();
              path.add(child.con);
              Print(child,path);
              path.remove(path.size()-1);

            }
      }

java - 从 Trie 中获取单词列表

10 回答 10

Related

Reference