将您的算法从递归更改为迭代(使用循环)。它通常更快。之后,其他优化与算法和数据结构相关,以降低复杂度。找到这样的优化要困难得多(但有可能)。
编辑:对代码的一些想法,然后是我今天完成的单词搜索的一个版本:
- 你只想要英语单词(或意大利语单词等......)。要检查这一点,您应该使用字典,它是查找单词的最有效结构,无论是在内存还是性能方面
- 有评论说这个算法是强递归的。我不同意,我做了一个迭代的,向量的副本少得多,并且有单词检查。将算法从递归传递到迭代通常会减少副本的数量。但我仍然同意这个评论的一点:某些算法在递归形式中可能会更快(但它们并不常见)。此外,这个算法并不容易找到;)
- 你应该拆分你的代码。它将更具可读性,并且更易于优化。
- 您正在编写一个 boggle 解析器,因此您应该知道您的字典必须是精确的,并且应该包含所有形式的动词(例如,对于“be”:am、are、is、was、was 等... )。我用法语测试它,因为我的英文单词列表只有 850 个单词,太短了。
- 字典中带有重音的字母的条件并不重要,它们在这里只是为了避免使用更多的重音插槽。此外,我从文本文件中删除了每个组成的单词(其中带有插入符号)。
- 在我的电脑上,加载字典(非详尽的 350000 个法语单词,3 Mb)并搜索所有 5,6 和 7 长度的单词大约需要 0.6 秒(i7 2660K,8Gb RAM,使用 /0x 发布,由我测量,而不是由计算机;))。性能测量仅与您的机器相关,以进行比较。
所以,这里是代码。它是独立的,您只需编译它,使用英语(或意大利语或法语)单词文件(文本,每行一个单词)执行:
#include <string>
#include <iostream>
#include <stdlib.h>
#include <vector>
#include <time.h>
#include <map>
#include <algorithm>
// **************************************************** //
// //
// Data structure for the words : dictionnary //
// //
// **************************************************** //
struct DicNode
{
const char l;
bool isEndOfWord;
DicNode* children[26]; // a - z, case insensitive
DicNode* parent;
unsigned int min, max, level;
DicNode(char c, DicNode* p = 0, bool end = false) : l(c),isEndOfWord(end),parent(p),min((unsigned int)(-1)),max(0), level(parent ? parent->level + 1 : 0)
{
for(unsigned int t = 0; t < 26; t++)
children[t] = 0;
}
~DicNode()
{
for(unsigned int t = 0; t < 26; t++)
if(children[t])
delete children[t];
}
DicNode*& at(char l)
{
if(l == 'à' || l == 'ä' || l == 'â')
l = 'a';
if(l == 'é' || l == 'è' || l == 'ê' || l == 'ë')
l = 'e';
if(l == 'ï' || l == 'î')
l = 'i';
if(l == 'ö' || l == 'ô')
l = 'o';
if(l == 'ü' || l == 'û' || l == 'ù')
l = 'u';
if(l == 'ç')
l = 'c';
if(l <= 'Z')
l = 'a' + l - 'A';
// Note: undefined behavior if l > 'Z' or l < 'A'.
return children[(unsigned int)(l - 'a')];
}
bool inRange(unsigned int n) const
{
return n <= max && n >= min;
}
void print(std::string str = "") const
{
// Here recursive algorithm is way better because of the object-oriented structure
if(isEndOfWord)
std::cout << (str + l) << "\n";
for(unsigned int t = 0; t < 26; t++)
if(children[t])
children[t]->print(str + l);
}
std::string toString() const
{
std::string str(level, '0');
const DicNode* node = this;
while(node && node->level > 0)
{
str[node->level - 1] = node->l;
node = node->parent;
}
return str;
}
};
void addWord(DicNode* root, const std::string& str)
{
if(!root || str == "") return;
DicNode* node = root;
unsigned int i = 0;
while(i != str.size())
{
if(str.size() - i > node->max) node->max = str.size() - i;
if(str.size() - i < node->min) node->min = str.size() - i;
char c = tolower(str[i]);
DicNode*& n = node->at(c);
if(!n) n = new DicNode(c,node);
node = n;
i++;
}
node->isEndOfWord = true;
}
// **************************************************** //
// //
// Data structures for the algorithm //
// //
// **************************************************** //
enum Direction
{
INVALID,
NORTH,
NORTH_EAST,
EAST,
SOUTH_EAST,
SOUTH,
SOUTH_WEST,
WEST,
NORTH_WEST,
FINISHED
};
Direction incDir(Direction d)
{
switch(d)
{
case NORTH: return NORTH_EAST;
case NORTH_EAST: return EAST;
case EAST: return SOUTH_EAST;
case SOUTH_EAST: return SOUTH;
case SOUTH: return SOUTH_WEST;
case SOUTH_WEST: return WEST;
case WEST: return NORTH_WEST;
case NORTH_WEST: return NORTH;
default: return INVALID;
}
}
Direction operator-(Direction d)
{
switch(d)
{
case NORTH: return SOUTH;
case NORTH_EAST: return SOUTH_WEST;
case EAST: return WEST;
case SOUTH_EAST: return NORTH_WEST;
case SOUTH: return NORTH;
case SOUTH_WEST: return NORTH_EAST;
case WEST: return EAST;
case NORTH_WEST: return SOUTH_EAST;
default: return INVALID;
}
}
std::string toString(Direction d)
{
switch(d)
{
case NORTH: return "NORTH";
case NORTH_EAST: return "NORTH_EAST";
case EAST: return "EAST";
case SOUTH_EAST: return "SOUTH_EAST";
case SOUTH: return "SOUTH";
case SOUTH_WEST: return "SOUTH_WEST";
case WEST: return "WEST";
case NORTH_WEST: return "NORTH_WEST";
default: return "INVALID";
}
}
struct Cell
{
char l;
DicNode* node;
Direction dir, dirParent;
Cell(char l_ = 'A') : l(l_),node(0),dir(INVALID),dirParent(INVALID) {}
};
struct ProbLetter
{
char letter;
float proba;
};
class Map
{
public:
Map(unsigned int w, unsigned int h) : width(w), height(h)
{
data = new Cell[width * height];
for(unsigned int t = 0; t < width * height; t++)
data[t] = randomEnglishLetter();
}
static char randomEnglishLetter()
{
// Frequency of english letters
static ProbLetter probas[26] =
{
{ 'Z', 0.074f },
{ 'Q', 0.095f },
{ 'X', 0.150f },
{ 'J', 0.153f },
{ 'K', 0.772f },
{ 'V', 0.978f },
{ 'B', 1.492f },
{ 'P', 1.929f },
{ 'Y', 1.974f },
{ 'G', 2.015f },
{ 'F', 2.228f },
{ 'W', 2.360f },
{ 'M', 2.406f },
{ 'U', 2.758f },
{ 'C', 2.782f },
{ 'L', 4.025f },
{ 'D', 4.253f },
{ 'R', 5.987f },
{ 'H', 6.094f },
{ 'S', 6.327f },
{ 'N', 6.749f },
{ 'I', 6.966f },
{ 'O', 7.507f },
{ 'A', 8.167f },
{ 'T', 9.056f },
{ 'E', 12.702f }
};
// Random number between 0 and 1
float p = 100.0f * float(rand() % 10000) / 9999.0f;
float sum = 0.0f;
for(unsigned int t = 0; t < 26; t++)
{
sum += probas[t].proba;
if(p < sum) return probas[t].letter;
}
return probas[25].letter;
}
Cell& operator()(unsigned int x, unsigned int y)
{
return data[x + y * width];
}
bool inRange(int x, int y)
{
return x >= 0 && x < (int)width && y >= 0 && y < (int)height;
}
void print()
{
for(unsigned int y = 0; y < height; y++)
{
for(unsigned int x = 0; x < width; x++)
std::cout << " " << data[x + y * width].l;
std::cout << "\n";
}
}
unsigned int getWidth() const { return width; }
unsigned int getHeight() const { return height; }
private:
unsigned int width, height;
Cell* data;
};
// **************************************************** //
// //
// Word-search algorithm //
// //
// **************************************************** //
inline void advance(int& x, int& y, Direction d)
{
switch(d)
{
case NORTH:
y--;
return;
case NORTH_EAST:
x++;
y--;
return;
case EAST:
x++;
return;
case SOUTH_EAST:
x++;
y++;
return;
case SOUTH:
y++;
return;
case SOUTH_WEST:
x--;
y++;
return;
case WEST:
x--;
return;
case NORTH_WEST:
x--;
y--;
return;
default:
return;
}
}
struct Data
{
Map& map;
int x;
int y;
};
bool descend(Data& dat, unsigned int n)
{
DicNode* parent = dat.map(dat.x, dat.y).node;
if(n == parent->level) return false;
Direction dir = dat.map(dat.x, dat.y).dir;
if(dir == FINISHED) return false;
else if(dir == INVALID) dir = NORTH;
int pX = dat.x, pY = dat.y;
bool firstIteration = true;
for(; firstIteration || dir != NORTH; dir = incDir(dir))
{
firstIteration = false;
pX = dat.x;
pY = dat.y;
advance(pX, pY, dir);
if(dat.map.inRange(pX, pY) // (pX, pY) is not outside the map
&& dat.map(pX, pY).node == 0 // The cell is not already used
&& parent->at(dat.map(pX, pY).l) != 0) // An entry in the dictionary exists
{
// We found the next node
dat.map(dat.x, dat.y).dir = dir;
dat.map(pX, pY).node = parent->at(dat.map(pX, pY).l);
dat.map(pX, pY).dirParent = -dir;
dat.x = pX;
dat.y = pY;
return true;
}
}
return false;
}
bool ascend(Data& dat)
{
// Go back on the previous node
Direction dp = dat.map(dat.x, dat.y).dirParent;
if(dp == INVALID) return false;
dat.map(dat.x, dat.y).dir = INVALID;
dat.map(dat.x, dat.y).dirParent = INVALID;
dat.map(dat.x, dat.y).node = 0;
advance(dat.x, dat.y, dp);
dat.map(dat.x, dat.y).dir = incDir(dat.map(dat.x, dat.y).dir);
if(dat.map(dat.x, dat.y).dir == NORTH)
dat.map(dat.x, dat.y).dir = FINISHED;
return true;
}
void findWordsFromPosition(Map& map, DicNode* parent, unsigned int x, unsigned int y, unsigned int n, std::vector<std::string>& output)
{
if(!parent) return;
bool ok = true;
// Setup first node
map(x, y).node = parent;
map(x, y).dir = NORTH;
// while we can find next node with direction
// If marked node has right order and is end of word, or if condition on n is not verified:
// add it to the output (if condition on n is verified)
// no need to go further (because of n), so advance direction of parent cell, reset current cell, and go up for the current position
Data dat = { map, x, y };
while(ok)
{
if(map(dat.x,dat.y).node->toString() == "ane")
std::cout << "ok\n";
if(!descend(dat, n))
ok = ascend(dat);
else
{
DicNode* node = dat.map(dat.x, dat.y).node;
if(node->level == n && node->isEndOfWord)
{
std::string str = node->toString();
if(std::find(output.begin(), output.end(), str) == output.end())
output.push_back(str);
ok = ascend(dat);
}
else if(!node->inRange(n - node->level))
ok = ascend(dat);
}
}
// Clean first node
map(x, y).dir = INVALID;
map(x, y).dirParent = INVALID;
map(x, y).node = 0;
}
void getWords(Map& map, DicNode& dic, unsigned int n, std::vector<std::string>& output)
{
if(n > dic.max || n < dic.min) return;
// Search words for every position
for(unsigned int y = 0; y < map.getHeight(); y++)
for(unsigned int x = 0; x < map.getWidth(); x++)
findWordsFromPosition(map,dic.at(map(x,y).l),x,y,n,output);
}
#include <fstream>
int main()
{
srand((unsigned int)time(0));
// Prepare the words, for example, load them from a real dictionary
DicNode dictionary(0);
unsigned int maxSize = 0;
// Note: the following code make sense only if you load words from a file
{
std::ifstream dico("english.txt");
std::string str;
int count = 0;
while(dico >> str)
{
if(str.size() > maxSize) maxSize = str.size();
addWord(&dictionary,str);
count++;
}
std::cout << "Read " << count << " words from dictionary, longest with " << maxSize << " letters.\n";
}
// Prepare the matrix
Map mat(4,4);
/*
mat(0,0) = 'A';
mat(1,0) = 'N';
mat(2,0) = 'F';
mat(3,0) = 'N';
mat(0,1) = 'S';
mat(1,1) = 'L';
mat(2,1) = 'E';
mat(3,1) = 'R';
mat(0,2) = 'G';
mat(1,2) = 'O';
mat(2,2) = 'R';
mat(3,2) = 'R';
mat(0,3) = 'S';
mat(1,3) = 'I';
mat(2,3) = 'U';
mat(3,3) = 'I';
*/
std::cout << "\nMatrix:\n";
mat.print();
// Search words
std::vector<std::string> words;
for(unsigned int t = 5; t <= 7; t++)
getWords(mat,dictionary,t,words);
std::cout << "\nWords:\n";
if(words.size() == 0)
std::cout << " (no words)\n";
else
for(unsigned int t = 0; t < words.size(); t++)
std::cout << " " << words[t] << "\n";
}
注意:前面的代码是一个代码搜索解析器。它现在已从答案中删除,但您仍然可以通过在此答案的编辑修订中搜索来获得它。
EDIT2:一些口头解释
词典。它是一种基数树,但每个节点只有一个字母要存储。它最多可以有 26 个孩子(如果包含重音和/或数字,则更多),每个字母对应一个字母。基本布局如下:
词树 http://img571.imageshack.us/img571/2281/wtree.png
例子:
字典示例 http://img706.imageshack.us/img706/1244/wordsr.png
每个节点还包含一个布尔值,指示它是否是单词的结尾。此外,每个节点存储其子分支的最小和最大大小、其父级及其从根开始的级别(= 前缀的大小)。当我们看到没有子分支可以给出具有特定长度的单词时,它允许停止搜索单词。
矩阵。矩阵是一个双字母数组。但是,为了提高效率,我添加了一些数据:它在字典中的对应节点(参见算法)、到它的子节点(参见算法)和它的父节点(参见算法)的方向。
算法。这个想法是通过“降序”(增加表示当前找到的字符串的路径)在地图中“循环”。当无法下降时,我们必须“上升”:
FIND WORD:
----------
while(we can continue)
try descend
if(failed to descend)
ascend
else
node = node of current cell
if node is end of word
add it to the found words, then ascend
else if node is out of range (ie, n = 5, but with node you can only have words with 3 characters)
ascend
DESCEND:
--------
we're from cell (x,y), with node D
for each direction (from cell.direction to north-west, clockwise)
advance position (x,y) with direction
if the position is valid
and we haven't already visited the cell for this word
and the node got with D.children[cell.letter] exists
set the current position of the algorithm to the result of the advance
set the direction of the old cell to the direction used to advance
set the direction of the new cell to the opposite (pointing to the old cell)
set the node of the new cell by the found one
return
ASCEND:
-------
we're at (x,y)
cell.node = 0 (the cell is not considered as visited anymore)
advance the parent direction by one clockwise (if this direction is north-west, we use a special direction that say that we cannot rotate anymore: FINISHED)
cell.dir = INVALID
advance (x,y) following cell.direction_to_parent
cell.direction_to_parent = INVALID
set the current position to the result of advance
图片说明:
算法展开 http://img822.imageshack.us/img822/1374/algow.png
步骤(在损失中标明的数字):
- 这是我们的矩阵。我用法语词典对其进行了测试。假设我们从 (0,0) 单元格开始(无论您从哪里开始,该算法都有效)。
- 第一个有效方向(地图中的单词+)是东。我们跟随它前进。该节点不是单词的结尾,并且仍然有效。
- 第一个有效方向(地图中的单词 +)是东南。我们跟随它前进。该节点不是单词的结尾,并且仍然有效。
- 第一个有效方向(地图中的单词+)是东。我们跟随它前进。该节点不是单词的结尾,并且仍然有效。
- 没有有效的方向(地图外、已访问的单元格(E)或不在字典中),所以我们上升
- 第一个有效方向(地图中的单词 +)是东南。我们跟随它前进。该节点不是单词的结尾,并且仍然有效。
- 第一个有效方向(地图中的单词+)是南。我们跟随它前进。该节点不是单词的结尾,并且仍然有效。
- 第一个有效方向(地图中的单词 +)是西。我们跟随它前进。该节点是一个单词的结尾(即“anerie”),因此我们将其添加到找到的单词列表中,然后上升。有关信息,法语中的“ânerie”在英语中是“废话”或“愚蠢”。
- 等等...