c++ - 优化这个递归函数 [Boggle resolver]

Question

我实现了一个递归函数来解决这个问题：

使用由字母 (A...Z) 组成的 4x4 矩阵，对于每个字母 ，仅遵循相邻和对角单元格即可获得所有n 长度的单词。

我的函数的想法是对每个可能的方向的每个字母进行递归调用，然后连接将要形成的字符串。当这个字符串的长度为n时，它会停止。为了防止返回到已经使用的字母，我创建了一个指向字母类的指针数组，它将每个新的可能字母与所有使用的字母进行比较。为此，我为每个字母创建了一个特殊的类。

这是函数：（对不起，它很长）

dirSu 表示 Up - dirGiu Down - dirDx right - dirSx left - 和所有其他的对角线位置

    void decisione(lettera *start, lettera *next, int count, string seq, vector <lettera*> visitate, int quanti) {

        if (next == nullptr) return ;

        if (count == quanti) {
            bool test = false;

            for (int k = 0; k < start->sequenze.size(); k++) { //To prevent same letter to be used
                if (start->sequenze[k] == seq)  {
                    test = true;
                    break;
                }
            }
            if (!test) start->sequenze.push_back(seq);
            return ;
        }

        seq = seq + next->chiave; //String concatenating every call
        count++; //Counter to decide when stop
        visitate.push_back(next); //Array of used letters

        int i;
        bool test;

        //Decide direction
        test = false;
        for(i = 0; i < visitate.size(); i++) {
            if (visitate[i] == next->dirSu) { 
                test = true;
                break;
            }
        }
        if (!test) decisione(start, next->dirSu, count, seq, visitate, quanti);


        test = false;
        for(i = 0; i < visitate.size(); i++) {
            if (visitate[i] == next->dirSu) { 
                test = true;
                break;
            }
        }
        if (!test) decisione(start, next->dirSu, count, seq, visitate, quanti);

        test = false;
        for(i = 0; i < visitate.size(); i++) {
            if (visitate[i] == next->dirGiu) { 
                test = true;
                break;
            }
        }
        if (!test) decisione(start, next->dirGiu, count, seq, visitate, quanti);

        test = false;
        for(i = 0; i < visitate.size(); i++) {
            if (visitate[i] == next->dirSx) { 
                test = true;
                break;
            }
        }
        if (!test) decisione(start, next->dirSx, count, seq, visitate, quanti);

        test = false;
        for(i = 0; i < visitate.size(); i++) {
            if (visitate[i] == next->dirDx) { 
                test = true;
                break;
            }
        }
        if (!test) decisione(start, next->dirDx, count, seq, visitate, quanti);

        test = false;
        for(i = 0; i < visitate.size(); i++) {
            if (visitate[i] == next->dirAdx) { 
                test = true;
                break;
            }
        }
        if (!test) decisione(start, next->dirAdx, count, seq, visitate, quanti);

        test = false;
        for(i = 0; i < visitate.size(); i++) {
            if (visitate[i] == next->dirAsx) { 
                test = true;
                break;
            }
        }
        if (!test) decisione(start, next->dirAsx, count, seq, visitate, quanti);

        test = false;
        for(i = 0; i < visitate.size(); i++) {
            if (visitate[i] == next->dirBdx) { 
                test = true;
                break;
            }
        }
        if (!test) decisione(start, next->dirBdx, count, seq, visitate, quanti);

        test = false;
        for(i = 0; i < visitate.size(); i++) {
            if (visitate[i] == next->dirBsx) { 
                test = true;
                break;
            }
        }
        if (!test) decisione(start, next->dirBsx, count, seq, visitate, quanti);

    }

好吧，当我激活3个字母的单词功能时，它真的很快，4个字母：矩阵中每个字母的计算时间为0.028s，5个字母：0.225s，6个字母：1.891s，7个字母：14.914s。

有没有办法通过优化我的函数来减少计算时间？（或删除我愚蠢的逻辑错误？）非常感谢！

我用于测试的矩阵是：

C A S A
C O S A
B A C I
O T A T

我在每次速度测试中都使用了这个（没有按频率或随机计算），并且有了这个（自从我的问题以来我的递归函数得到了改进），我在大约 90 秒内得到了每个起始字母有 5、6、7 个字母的所有单词。字很多，虽然只是一个4x4的矩阵！如果我可以私下将其发送给您，那么看看您对同一问题的解决方案可能会很有趣。（这里不知道怎么联系你）

score 2 · Accepted Answer

将您的算法从递归更改为迭代（使用循环）。它通常更快。之后，其他优化与算法和数据结构相关，以降低复杂度。找到这样的优化要困难得多（但有可能）。

编辑：对代码的一些想法，然后是我今天完成的单词搜索的一个版本：

你只想要英语单词（或意大利语单词等......）。要检查这一点，您应该使用字典，它是查找单词的最有效结构，无论是在内存还是性能方面
有评论说这个算法是强递归的。我不同意，我做了一个迭代的，向量的副本少得多，并且有单词检查。将算法从递归传递到迭代通常会减少副本的数量。但我仍然同意这个评论的一点：某些算法在递归形式中可能会更快（但它们并不常见）。此外，这个算法并不容易找到；）
你应该拆分你的代码。它将更具可读性，并且更易于优化。
您正在编写一个 boggle 解析器，因此您应该知道您的字典必须是精确的，并且应该包含所有形式的动词（例如，对于“be”：am、are、is、was、was 等... ）。我用法语测试它，因为我的英文单词列表只有 850 个单词，太短了。
字典中带有重音的字母的条件并不重要，它们在这里只是为了避免使用更多的重音插槽。此外，我从文本文件中删除了每个组成的单词（其中带有插入符号）。
在我的电脑上，加载字典（非详尽的 350000 个法语单词，3 Mb）并搜索所有 5,6 和 7 长度的单词大约需要 0.6 秒（i7 2660K，8Gb RAM，使用 /0x 发布，由我测量，而不是由计算机;））。性能测量仅与您的机器相关，以进行比较。

所以，这里是代码。它是独立的，您只需编译它，使用英语（或意大利语或法语）单词文件（文本，每行一个单词）执行：

#include <string>
#include <iostream>
#include <stdlib.h>
#include <vector>
#include <time.h>
#include <map>
#include <algorithm>

// **************************************************** //
//                                                      //
// Data structure for the words : dictionnary           //
//                                                      //
// **************************************************** //
struct DicNode
{
    const char l;
    bool isEndOfWord;
    DicNode* children[26]; // a - z, case insensitive
    DicNode* parent;
    unsigned int min, max, level;

    DicNode(char c, DicNode* p = 0, bool end = false) : l(c),isEndOfWord(end),parent(p),min((unsigned int)(-1)),max(0), level(parent ? parent->level + 1 : 0)
    {
        for(unsigned int t = 0; t < 26; t++)
            children[t] = 0;
    }

    ~DicNode()
    {
        for(unsigned int t = 0; t < 26; t++)
            if(children[t])
                delete children[t];
    }

    DicNode*& at(char l)
    {
        if(l == 'à' || l == 'ä' || l == 'â')
            l = 'a';
        if(l == 'é' || l == 'è' || l == 'ê' || l == 'ë')
            l = 'e';
        if(l == 'ï' || l == 'î')
            l = 'i';
        if(l == 'ö' || l == 'ô')
            l = 'o';
        if(l == 'ü' || l == 'û' || l == 'ù')
            l = 'u';
        if(l == 'ç')
            l = 'c';
        if(l <= 'Z')
            l = 'a' + l - 'A';

        // Note: undefined behavior if l > 'Z' or l < 'A'.
        return children[(unsigned int)(l - 'a')];
    }

    bool inRange(unsigned int n) const
    {
        return n <= max && n >= min;
    }

    void print(std::string str = "") const
    {
        // Here recursive algorithm is way better because of the object-oriented structure
        if(isEndOfWord)
            std::cout << (str + l) << "\n";

        for(unsigned int t = 0; t < 26; t++)
            if(children[t])
                children[t]->print(str + l);
    }

    std::string toString() const
    {
        std::string str(level, '0');
        const DicNode* node = this;
        while(node && node->level > 0)
        {
            str[node->level - 1] = node->l;
            node = node->parent;
        }
        return str;
    }
};

void addWord(DicNode* root, const std::string& str)
{
    if(!root || str == "") return;
    DicNode* node = root;
    unsigned int i = 0;
    while(i != str.size())
    {
        if(str.size() - i > node->max) node->max = str.size() - i;
        if(str.size() - i < node->min) node->min = str.size() - i;

        char c = tolower(str[i]);
        DicNode*& n = node->at(c);
        if(!n) n = new DicNode(c,node);
        node = n;
        i++;
    }
    node->isEndOfWord = true;
}

// **************************************************** //
//                                                      //
// Data structures for the algorithm                    //
//                                                      //
// **************************************************** //

enum Direction
{
    INVALID,
    NORTH,
    NORTH_EAST,
    EAST,
    SOUTH_EAST,
    SOUTH,
    SOUTH_WEST,
    WEST,
    NORTH_WEST,
    FINISHED
};

Direction incDir(Direction d)
{
    switch(d)
    {
        case NORTH:         return NORTH_EAST;
        case NORTH_EAST:    return EAST;
        case EAST:          return SOUTH_EAST;
        case SOUTH_EAST:    return SOUTH;
        case SOUTH:         return SOUTH_WEST;
        case SOUTH_WEST:    return WEST;
        case WEST:          return NORTH_WEST;
        case NORTH_WEST:    return NORTH;
        default:            return INVALID;
    }
}

Direction operator-(Direction d)
{
    switch(d)
    {
        case NORTH:         return SOUTH;
        case NORTH_EAST:    return SOUTH_WEST;
        case EAST:          return WEST;
        case SOUTH_EAST:    return NORTH_WEST;
        case SOUTH:         return NORTH;
        case SOUTH_WEST:    return NORTH_EAST;
        case WEST:          return EAST;
        case NORTH_WEST:    return SOUTH_EAST;
        default:            return INVALID;
    }
}

std::string toString(Direction d)
{
    switch(d)
    {
        case NORTH:         return "NORTH";
        case NORTH_EAST:    return "NORTH_EAST";
        case EAST:          return "EAST";
        case SOUTH_EAST:    return "SOUTH_EAST";
        case SOUTH:         return "SOUTH";
        case SOUTH_WEST:    return "SOUTH_WEST";
        case WEST:          return "WEST";
        case NORTH_WEST:    return "NORTH_WEST";
        default:            return "INVALID";
    }
}

struct Cell
{
    char l;
    DicNode* node;
    Direction dir, dirParent;
    Cell(char l_ = 'A') : l(l_),node(0),dir(INVALID),dirParent(INVALID) {}
};

struct ProbLetter
{
    char letter;
    float proba;
};

class Map
{
    public:
        Map(unsigned int w, unsigned int h) : width(w), height(h)
        {
            data = new Cell[width * height];
            for(unsigned int t = 0; t < width * height; t++)
                data[t] = randomEnglishLetter();
        }

        static char randomEnglishLetter()
        {
            // Frequency of english letters
            static ProbLetter probas[26] =
            {
                { 'Z', 0.074f },
                { 'Q', 0.095f },
                { 'X', 0.150f },
                { 'J', 0.153f },
                { 'K', 0.772f },
                { 'V', 0.978f },
                { 'B', 1.492f },
                { 'P', 1.929f },
                { 'Y', 1.974f },
                { 'G', 2.015f },
                { 'F', 2.228f },
                { 'W', 2.360f },
                { 'M', 2.406f },
                { 'U', 2.758f },
                { 'C', 2.782f },
                { 'L', 4.025f },
                { 'D', 4.253f },
                { 'R', 5.987f },
                { 'H', 6.094f },
                { 'S', 6.327f },
                { 'N', 6.749f },
                { 'I', 6.966f },
                { 'O', 7.507f },
                { 'A', 8.167f },
                { 'T', 9.056f },
                { 'E', 12.702f }
            };

            // Random number between 0 and 1
            float p = 100.0f * float(rand() % 10000) / 9999.0f;
            float sum = 0.0f;
            for(unsigned int t = 0; t < 26; t++)
            {
                sum += probas[t].proba;
                if(p < sum) return probas[t].letter;
            }
            return probas[25].letter;
        }

        Cell& operator()(unsigned int x, unsigned int y)
        {
            return data[x + y * width];
        }

        bool inRange(int x, int y)
        {
            return x >= 0 && x < (int)width && y >= 0 && y < (int)height;
        }

        void print()
        {
            for(unsigned int y = 0; y < height; y++)
            {
                for(unsigned int x = 0; x < width; x++)
                    std::cout << "  " << data[x + y * width].l;
                std::cout << "\n";
            }
        }

        unsigned int getWidth() const { return width; }
        unsigned int getHeight() const { return height; }

    private:
        unsigned int width, height;
        Cell* data;
};


// **************************************************** //
//                                                      //
// Word-search algorithm                                //
//                                                      //
// **************************************************** //

inline void advance(int& x, int& y, Direction d)
{
    switch(d)
    {
        case NORTH:
            y--;
            return;
        case NORTH_EAST:
            x++;
            y--;
            return;
        case EAST:
            x++;
            return;
        case SOUTH_EAST:
            x++;
            y++;
            return;
        case SOUTH:
            y++;
            return;
        case SOUTH_WEST:
            x--;
            y++;
            return;
        case WEST:
            x--;
            return;
        case NORTH_WEST:
            x--;
            y--;
            return;
        default:
            return;
    }
}

struct Data
{
    Map& map;
    int x;
    int y;
};

bool descend(Data& dat, unsigned int n)
{
    DicNode* parent = dat.map(dat.x, dat.y).node;
    if(n == parent->level) return false;

    Direction dir = dat.map(dat.x, dat.y).dir;
    if(dir == FINISHED) return false;
    else if(dir == INVALID) dir = NORTH;

    int pX = dat.x, pY = dat.y;
    bool firstIteration = true;
    for(; firstIteration || dir != NORTH; dir = incDir(dir))
    {
        firstIteration = false;
        pX = dat.x;
        pY = dat.y;
        advance(pX, pY, dir);

        if(dat.map.inRange(pX, pY) // (pX, pY) is not outside the map
            && dat.map(pX, pY).node == 0 // The cell is not already used
            && parent->at(dat.map(pX, pY).l) != 0) // An entry in the dictionary exists
        {
            // We found the next node
            dat.map(dat.x, dat.y).dir = dir;
            dat.map(pX, pY).node = parent->at(dat.map(pX, pY).l);
            dat.map(pX, pY).dirParent = -dir;
            dat.x = pX;
            dat.y = pY;

            return true;
        }
    }

    return false;
}

bool ascend(Data& dat)
{
    // Go back on the previous node

    Direction dp = dat.map(dat.x, dat.y).dirParent;
    if(dp == INVALID) return false;

    dat.map(dat.x, dat.y).dir = INVALID;
    dat.map(dat.x, dat.y).dirParent = INVALID;
    dat.map(dat.x, dat.y).node = 0;
    advance(dat.x, dat.y, dp);

    dat.map(dat.x, dat.y).dir = incDir(dat.map(dat.x, dat.y).dir);
    if(dat.map(dat.x, dat.y).dir == NORTH)
        dat.map(dat.x, dat.y).dir = FINISHED;

    return true;
}

void findWordsFromPosition(Map& map, DicNode* parent, unsigned int x, unsigned int y, unsigned int n, std::vector<std::string>& output)
{
    if(!parent) return;

    bool ok = true;

    // Setup first node
    map(x, y).node = parent;
    map(x, y).dir = NORTH;

    // while we can find next node with direction
    // If marked node has right order and is end of word, or if condition on n is not verified:
    //     add it to the output (if condition on n is verified)
    //     no need to go further (because of n), so advance direction of parent cell, reset current cell, and go up for the current position
    Data dat = { map, x, y };
    while(ok)
    {
        if(map(dat.x,dat.y).node->toString() == "ane")
            std::cout << "ok\n";
        if(!descend(dat, n))
            ok = ascend(dat);
        else
        {
            DicNode* node = dat.map(dat.x, dat.y).node;
            if(node->level == n && node->isEndOfWord)
            {
                std::string str = node->toString();
                if(std::find(output.begin(), output.end(), str) == output.end())
                    output.push_back(str);
                ok = ascend(dat);
            }
            else if(!node->inRange(n - node->level))
                ok = ascend(dat);
        }
    }

    // Clean first node
    map(x, y).dir = INVALID;
    map(x, y).dirParent = INVALID;
    map(x, y).node = 0;
}

void getWords(Map& map, DicNode& dic, unsigned int n, std::vector<std::string>& output)
{
    if(n > dic.max || n < dic.min) return;

    // Search words for every position
    for(unsigned int y = 0; y < map.getHeight(); y++)
        for(unsigned int x = 0; x < map.getWidth(); x++)
            findWordsFromPosition(map,dic.at(map(x,y).l),x,y,n,output);
}

#include <fstream>

int main()
{
    srand((unsigned int)time(0));
    // Prepare the words, for example, load them from a real dictionary
    DicNode dictionary(0);
    unsigned int maxSize = 0;

    // Note: the following code make sense only if you load words from a file
    {
        std::ifstream dico("english.txt");
        std::string str;
        int count = 0;
        while(dico >> str)
        {
            if(str.size() > maxSize) maxSize = str.size();
            addWord(&dictionary,str);
            count++;
        }
        std::cout << "Read " << count << " words from dictionary, longest with " << maxSize << " letters.\n";
    }

    // Prepare the matrix
    Map mat(4,4);
    /*
    mat(0,0) = 'A';
    mat(1,0) = 'N';
    mat(2,0) = 'F';
    mat(3,0) = 'N';
    mat(0,1) = 'S';
    mat(1,1) = 'L';
    mat(2,1) = 'E';
    mat(3,1) = 'R';
    mat(0,2) = 'G';
    mat(1,2) = 'O';
    mat(2,2) = 'R';
    mat(3,2) = 'R';
    mat(0,3) = 'S';
    mat(1,3) = 'I';
    mat(2,3) = 'U';
    mat(3,3) = 'I';
    */
    std::cout << "\nMatrix:\n";
    mat.print();

    // Search words
    std::vector<std::string> words;
    for(unsigned int t = 5; t <= 7; t++)
        getWords(mat,dictionary,t,words);
    std::cout << "\nWords:\n";
    if(words.size() == 0)
        std::cout << " (no words)\n";
    else
        for(unsigned int t = 0; t < words.size(); t++)
            std::cout << " " << words[t] << "\n";
}

注意：前面的代码是一个代码搜索解析器。它现在已从答案中删除，但您仍然可以通过在此答案的编辑修订中搜索来获得它。

EDIT2：一些口头解释

词典。它是一种基数树，但每个节点只有一个字母要存储。它最多可以有 26 个孩子（如果包含重音和/或数字，则更多），每个字母对应一个字母。基本布局如下：

词树 http://img571.imageshack.us/img571/2281/wtree.png

例子：

字典示例 http://img706.imageshack.us/img706/1244/wordsr.png

每个节点还包含一个布尔值，指示它是否是单词的结尾。此外，每个节点存储其子分支的最小和最大大小、其父级及其从根开始的级别（= 前缀的大小）。当我们看到没有子分支可以给出具有特定长度的单词时，它允许停止搜索单词。

矩阵。矩阵是一个双字母数组。但是，为了提高效率，我添加了一些数据：它在字典中的对应节点（参见算法）、到它的子节点（参见算法）和它的父节点（参见算法）的方向。

算法。这个想法是通过“降序”（增加表示当前找到的字符串的路径）在地图中“循环”。当无法下降时，我们必须“上升”：

FIND WORD:
----------

while(we can continue)
    try descend
    if(failed to descend)
        ascend
    else
        node = node of current cell
        if node is end of word
            add it to the found words, then ascend
        else if node is out of range (ie, n = 5, but with node you can only have words with 3 characters)
            ascend

DESCEND:
--------

we're from cell (x,y), with node D
for each direction (from cell.direction to north-west, clockwise)
    advance position (x,y) with direction
    if the position is valid
        and we haven't already visited the cell for this word
        and the node got with D.children[cell.letter] exists
        set the current position of the algorithm to the result of the advance
        set the direction of the old cell to the direction used to advance
        set the direction of the new cell to the opposite (pointing to the old cell)
        set the node of the new cell by the found one
        return

ASCEND:
-------

we're at (x,y)
cell.node = 0 (the cell is not considered as visited anymore)
advance the parent direction by one clockwise (if this direction is north-west, we use a special direction that say that we cannot rotate anymore: FINISHED)
cell.dir = INVALID
advance (x,y) following cell.direction_to_parent
cell.direction_to_parent = INVALID
set the current position to the result of advance

图片说明：

算法展开 http://img822.imageshack.us/img822/1374/algow.png

步骤（在损失中标明的数字）：

这是我们的矩阵。我用法语词典对其进行了测试。假设我们从 (0,0) 单元格开始（无论您从哪里开始，该算法都有效）。
第一个有效方向（地图中的单词+）是东。我们跟随它前进。该节点不是单词的结尾，并且仍然有效。
第一个有效方向（地图中的单词 +）是东南。我们跟随它前进。该节点不是单词的结尾，并且仍然有效。
第一个有效方向（地图中的单词+）是东。我们跟随它前进。该节点不是单词的结尾，并且仍然有效。
没有有效的方向（地图外、已访问的单元格（E）或不在字典中），所以我们上升
第一个有效方向（地图中的单词 +）是东南。我们跟随它前进。该节点不是单词的结尾，并且仍然有效。
第一个有效方向（地图中的单词+）是南。我们跟随它前进。该节点不是单词的结尾，并且仍然有效。
第一个有效方向（地图中的单词 +）是西。我们跟随它前进。该节点是一个单词的结尾（即“anerie”），因此我们将其添加到找到的单词列表中，然后上升。有关信息，法语中的“ânerie”在英语中是“废话”或“愚蠢”。
等等...

score 2 · Accepted Answer

您正在处理大量组合，因为您不是在寻找实际的单词。如果是这样，您将能够查找字符组合并查看在特定语言中是否仍然可以使用单词。查找会花费更多时间，但会消除许多组合。
但最后，你愿意花多少时间让应用程序更快一点？

话虽如此：优化可能存在可能性，因为您按值传递向量和字符串，它们都会生成内存分配（当然对于向量，可能对于字符串）。相反，传递一个std::array<lettera*,wordlength>应该更快，并且也会给你这个词。

score 2 · Accepted Answer

首先，您正在寻找的解决方案的数量随着长度呈指数增长，因此任何解决方案都是指数的；你所能做的就是减少常数因子。

但是，我建议对您已经找到的字符串进行排序，并对每个新字符串进行二进制搜索。对于 7 的长度，一旦排除重复项，您会发现超过 60000 个。（在一个快速实现中，我自己敲了敲，这造成了超过 10 倍的差异，差异越大，我正在寻找的字符串越长。）同样，用一个标志来注释每个单元格，指示它是否被访问或不，而不必以某种方式查找它，也应该节省大量时间。

c++ - 优化这个递归函数 [Boggle resolver]

3 回答 3

Related

Reference