0

我有 2 个模型 -Question并且Tag- 它们之间有一个 HABTM,它们共享一个连接表questions_tags

大饱眼福这个坏小子:

1.9.3p392 :011 > Question.count
   (852.1ms)  SELECT COUNT(*) FROM "questions" 
 => 417 
1.9.3p392 :012 > Tag.count
   (197.8ms)  SELECT COUNT(*) FROM "tags" 
 => 601 
1.9.3p392 :013 > Question.connection.execute("select count(*) from questions_tags").first["count"].to_i
   (648978.7ms)  select count(*) from questions_tags
 => 39919778 

我假设questions_tags连接表包含一堆重复的记录——否则,我不知道它为什么会这么大。

如何清理该联接表以使其仅包含uniq内容?或者我什至如何检查那里是否有重复的记录?

编辑 1

我正在使用 PostgreSQL,这是 join_table 的架构questions_tags

  create_table "questions_tags", :id => false, :force => true do |t|
    t.integer "question_id"
    t.integer "tag_id"
  end

  add_index "questions_tags", ["question_id"], :name => "index_questions_tags_on_question_id"
  add_index "questions_tags", ["tag_id"], :name => "index_questions_tags_on_tag_id"
4

2 回答 2

2

我将此添加为新答案,因为它与我的上一个有很大不同。这个不假设您id在连接表上有一个列。这将创建一个新表,在其中选择唯一行,然后删除旧表并重命名新表。这将比任何涉及子选择的方法快得多。

foo=# select * from questions_tags;
 question_id | tag_id
-------------+--------
           1 |      2
           2 |      1
           2 |      2
           1 |      1
           1 |      1
(5 rows)

foo=# select distinct question_id, tag_id into questions_tags_tmp from questions_tags;
SELECT 4
foo=# select * from questions_tags_tmp;
 question_id | tag_id
-------------+--------
           2 |      2
           1 |      2
           2 |      1
           1 |      1
(4 rows)

foo=# drop table questions_tags;
DROP TABLE
foo=# alter table questions_tags_tmp rename to questions_tags;
ALTER TABLE
foo=# select * from questions_tags;
 question_id | tag_id
-------------+--------
           2 |      2
           1 |      2
           2 |      1
           1 |      1
(4 rows)
于 2013-03-13T00:51:15.700 回答
1

删除带有错误标签引用的标签关联

DELETE  FROM questions_tags
WHERE   NOT EXISTS ( SELECT  1 
                 FROM    tags
                 WHERE   tags.id = questions_tags.tag_id);

删除带有错误问题参考的标签关联

DELETE  FROM questions_tags
WHERE   NOT EXISTS ( SELECT  1 
                 FROM    questions
                 WHERE   questions.id = questions_tags.question_id);

删除重复的标签关联

DELETE  FROM questions_tags
USING   ( SELECT qt3.user_id, qt3.question_id, MIN(qt3.id) id
          FROM   questions_tags qt3
          GROUP BY qt3.user_id, qt3.question_id
        ) qt2
WHERE   questions_tags.user_id=qt2.user_id AND 
        questions_tags.question_id=qt2.question_id AND
        questions_tags.id != qt2.id

笔记:

请先在您的开发环境中测试 SQL,然后再在您的生产环境中尝试它们。

于 2013-03-12T23:19:28.007 回答