9

在类似于 Ruzzle 或 Letterpress 的文字游戏中,用户必须从给定的一组字母中构造单词:

在此处输入图像描述

我将字典保存在一个简单的 SQL 表中:

create table good_words (
        word varchar(16) primary key
);

由于游戏持续时间很短,我不想通过调用 PHP 脚本来检查每个输入的单词,该脚本会在good_words表格中查找该单词。

相反,我想在回合开始之前通过一个 PHP 脚本调用下载所有可能的单词——因为所有字母都是已知的。

我的问题是:是否有一种很好的 SQLish 方法来查找此类单词?

即我可以运行一个更长时间的脚本一次向good_words表中添加一个列,该列将具有与列中相同的字母word,但按字母顺序排序......但我仍然想不出一种方法来匹配它给定一个集合的字母。

并且在 PHP 脚本内(与在数据库内)进行单词匹配可能会花费太长时间(因为带宽:必须将每一行从数据库获取到 PHP 脚本)。

请问有什么建议或见解吗?

将 postgresql-8.4.13 与 CentOS Linux 6.3 一起使用。

更新:

我的其他想法:

  1. 创建一个持续运行的脚本(cronjob 或守护程序),它将用预编译的字母板和可能的单词预填充 SQL 表 - 但仍然感觉浪费带宽和 CPU,我更愿意在数据库中解决这个问题
  2. 添加整数列a, b, ... ,z并且每当我将 a 存储word到时good_words,将出现的字母存储在那里。我想知道是否可以为此在 Pl/PgSQL 中创建插入触发器
4

7 回答 7

4

好问题,我投了赞成票。

您要做的是列出给定长度的给定字母的所有可能排列。如PostgreSQL wiki中所述,您可以创建一个函数并像这样调用它(匹配屏幕截图中突出显示的字母):

SELECT * FROM permute('{E,R,O,M}'::text[]);

现在,查询good_words使用类似:

SELECT gw.word, gw.stamp
  FROM good_words gw
  JOIN permute('{E,R,O,M}'::text[]) s(w) ON gw.word=array_to_string(s.w, '');
于 2013-03-05T10:07:26.783 回答
2

这可能是一个开始,除了它不检查我们是否有足够的字母,只检查他是否有正确的字母。

SELECT word from
(select word,generate_series(0,length(word)) as s from good_words) as q
WHERE substring(word,s,1) IN ('t','h','e','l','e','t','t','e','r','s')
GROUP BY word
HAVING count(*)>=length(word);

http://sqlfiddle.com/#!1/2e3a2/3

编辑:

这个查询只选择有效的词,虽然它看起来有点多余。它并不完美,但肯定证明它是可以做到的。

WITH words AS 
(SELECT word, substring(word,s,1) as sub from
(select word,generate_series(1,length(word)) as s from good_words) as q
WHERE substring(word,s,1) IN ('t','e','s','e','r','e','r','o','r','e','m','a','s','d','s','s'))

SELECT w.word FROM
(
SELECT word,words.sub,count(DISTINCT s) as cnt FROM
(SELECT s, substring(array_to_string(l, ''),s,1) as sub FROM
(SELECT l, generate_subscripts(l,1) as s FROM 
 (SELECT ARRAY['t','e','s','e','r','e','r','o','r','e','m','a','s','d','s','s'] as l) 
 as q) 
as q) as let JOIN
words ON let.sub=words.sub
GROUP BY words.word,words.sub) as let
JOIN
(select word,sub,count(*) as cnt from words
 GROUP BY word, sub)
as w ON let.word=w.word AND let.sub=w.sub AND let.cnt>=w.cnt
GROUP BY w.word
HAVING sum(w.cnt)=length(w.word);

摆弄该图像的所有可能的 3+ 字母单词 (485):http : //sqlfiddle.com/#!1/2fc66/1 摆弄 699 个单词,其中 485 个是正确的:http ://sqlfiddle.com/# !1/4f42e/1

编辑2:我们可以像这样使用数组运算符来获取包含我们想要的字母的单词列表:

SELECT word as sub from
(select word,generate_series(1,length(word)) as s from good_words) as q
GROUP BY word
HAVING array_agg(substring(word,s,1)) <@ ARRAY['t','e','s','e','r','e','r','o','r','e','m','a','s','d','s','s'];

所以我们可以用它来缩小我们需要检查的单词列表。

WITH words AS 
(SELECT word, substring(word,s,1) as sub from
(select word,generate_series(1,length(word)) as s from 
(
  SELECT word from
(select word,generate_series(1,length(word)) as s from good_words) as q
GROUP BY word
HAVING array_agg(substring(word,s,1)) <@ ARRAY['t','e','s','e','r','e','r','o','r','e','m','a','s','d','s','s']
)as q) as q)
SELECT DISTINCT w.word FROM
(
SELECT word,words.sub,count(DISTINCT s) as cnt FROM
(SELECT s, substring(array_to_string(l, ''),s,1) as sub FROM
(SELECT l, generate_subscripts(l,1) as s FROM 
 (SELECT ARRAY['t','e','s','e','r','e','r','o','r','e','m','a','s','d','s','s'] as l) 
 as q) 
as q) as let JOIN
words ON let.sub=words.sub
GROUP BY words.word,words.sub) as let
JOIN
(select word,sub,count(*) as cnt from words
 GROUP BY word, sub)
as w ON let.word=w.word AND let.sub=w.sub AND let.cnt>=w.cnt
GROUP BY w.word
HAVING sum(w.cnt)=length(w.word) ORDER BY w.word;

http://sqlfiddle.com/#!1/4f42e/44

我们可以使用 GIN 索引来处理数组,因此我们可能可以创建一个表来存储字母数组并让单词指向它(act、cat 和 tact 都指向数组 [a,c,t])所以可能这会加快速度,但这有待测试。

于 2013-03-05T09:58:40.057 回答
1

创建一个包含条目(id、char)的表,n 是您要查询的字符数。

select id, count(char) AS count from chartable where (char = x or char = y or char = z ...) and count = n group by id;

或(用于部分匹配)

select id, count(char) AS count from chartable where (char = x or char = y or char = z ...) group by id order by count;

该查询的结果具有符合规范的所有单词 ID。将结果缓存在 HashSet 中,并在输入单词时简单地进行查找。

于 2013-03-05T09:46:29.013 回答
1

在 8.4 中不起作用。可能只有 9.1+。SQL 小提琴

select word
from (
    select unnest(string_to_array(word, null)) c, word from good_words
    intersect all
    select unnest(string_to_array('TESTREROREMASDSS', null)) c, word from good_words
) s
group by word
having
    array_agg(c order by c) = 
    (select array_agg(c order by c) from unnest(string_to_array(word, null)) a(c))
于 2013-03-05T17:52:52.200 回答
1

您可以使用格式为“%a%c%t%”的排序字母添加列。然后使用查询:

 select * from table where 'abcttx' like sorted_letters

查找可以从字母“abcttx”构建的单词。我不知道性能,但简单性可能无法被击败:)

于 2013-03-05T19:39:36.787 回答
1

这是一个查询,可通过遍历相邻字段找到答案。

with recursive
input as (select '{{"t","e","s","e"},{"r","e","r","o"},{"r","e","m","a"},{"s","d","s","s"}}'::text[] as inp),
dxdy as(select * from (values(-1,-1),(-1,0),(-1,1),(0,1),(0,-1),(1,-1),(1,0),(1,1)) as v(dx, dy)),
start_position as(select * from generate_series(1,4) x, generate_series(1,4) y),
work as(select x,y,inp[y][x] as word from start_position, input
union
select w.x + dx, w.y + dy, w.word || inp[w.y+dy][w.x+dx]   
   from dxdy cross join input cross join work w 
   inner join good_words gw on gw.word like w.word || '%'
)
select distinct word from work
where exists(select * from good_words gw where gw.word = work.word)

(其他答案没有考虑到这一点)。

Sql fiddle 链接:http ://sqlfiddle.com/#!1/013cc/ 14(注意您需要一个带有 varchar_pattern_ops 的索引才能使查询相当快)。

于 2013-03-05T21:05:45.123 回答
0

我自己的解决方案是创建一个插入触发器,它将字母频率写入数组列:

create table good_words (
        word varchar(16) primary key,
        letters integer[26]
);

create or replace function count_letters() returns trigger as $body$
    declare
        alphabet varchar[];
        i integer;
    begin

        alphabet := regexp_split_to_array('abcdefghijklmnopqrstuvwxyz', '');
        new.word := lower(new.word);

        for i in 1 .. array_length(alphabet, 1)
        loop
                -- raise notice '%: %', i, alphabet[i];
                new.letters[i] := length(new.word) - length(replace(new.word, alphabet[i], ''));
        end loop;
        return new;
    end;
$body$ language plpgsql;

create trigger count_letters
    before insert on good_words
    for each row execute procedure count_letters();

然后我为随机板字符串生成类似的数组,并使用数组包含运算符tesereroremasdss 比较两个数组@>

任何新的想法或改进总是受欢迎的!

于 2013-03-07T15:21:36.483 回答