1

Inspired by iOS7 iMessage's next-word-prediction, I've decided to try to write a script that will learn, based on user input, which words / letters are most likely wanted to complete the user's current word or which word might most likely be desired next.

To do this, I'm going to use a data structure very similar to a Radix Tree (AKA a Patricia Trie).

Take this user input for example:

I like icecream

From that, my goal is to generate the following data structure:

var speakData = {
    "I": { //the key
        value: "I", //the stored value for this unit of the combination
        count: 1, //the number of times that this combination has occured
        followables: { // the next level of the tree; all combinations 
                       // that might follow this one
            " ": {
                value: " ",
                count: 1,
                followables: {
                    "l": {
                        value: "l",
                        count: 1,
                        followables: {
                            "i": {
                                value: "i",
                                count: 1,
                                followables: {
                                    "k": {
                                        value: "k",
                                        count: 1,
                                        followables: {
                                            "e": {
                                                value: "e",
                                                count: 1,
                                                followables: {
                                                    // and so on 
                                                }
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

This is essentially a Radix Tree, with some extra information, allowing me to weigh the probability of the learned possibilities that the user might want to type next.

From the above extremely limited data set, when the user types an "I", our best (and only) guess is that the next character will be a " ".

So now that I've explained my goal and method, here's my question:

How can I build this data structure from any given user input?

function learn(message, brain){
    for(var i = 0; i < message.length; i++){
        brain[message[i]] = {};
        brain[message[i]].value = message[i];
        brain[message[i]].count++;
        brain[message[i]].followables = 
    }
}

This is as far as I've gotten, but I'm not sure how to insert the next values at the proper positions recursively.

4

1 回答 1

1

只需制作一个简单的递归函数,如下所示:

function learn(message, brain){
  if(message.length == 0) return {}; // or do something else
  var ch = message[0]; // get the first character
  if(!brain[ch]) { // create new node when not exists
    brain[ch] = {value:ch,count:1,followables:{}};
  } else { // increment count when exist
    brain[ch].count += 1;
  }
  var substr = message.substring(1); // remove first character
  if(substr) { // do it for the remaining substring
    brain[ch].followables = learn(substr,brain[ch].followables)
  }
  return brain;
}

当然你也可以让它迭代,而不是递归。

// test code:
var brain = {};
brain = learn('test',brain);
brain = learn('testing',brain);
brain = learn('tes',brain);
brain = learn('yay',brain);
console.log(JSON.stringify(brain, null, 2));

会输出如下内容:

{
  "t": {
    "value": "t",
    "count": 3,
    "followables": {
      "e": {
        "value": "e",
        "count": 3,
        "followables": {
          "s": {
            "value": "s",
            "count": 3,
            "followables": {
              "t": {
                "value": "t",
                "count": 2,
                "followables": {
                  "i": {
                    "value": "i",
                    "count": 1,
                    "followables": {
                      "n": {
                        "value": "n",
                        "count": 1,
                        "followables": {
                          "g": {
                            "value": "g",
                            "count": 1,
                            "followables": {}
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  },
  "y": {
    "value": "y",
    "count": 1,
    "followables": {
      "a": {
        "value": "a",
        "count": 1,
        "followables": {
          "y": {
            "value": "y",
            "count": 1,
            "followables": {}
          }
        }
      }
    }
  }
}
于 2015-02-11T06:59:04.613 回答