0

我发现我有许多用户对象具有相同的电子邮件地址。我需要删除那些重复项。

User.select(:email).group(:email).having('COUNT(email) > 1')

我尝试了以下查询(类似于此处的上一个问题)。但我得到一个空数组。知道为什么吗?

2.0.0p247 :277 > User.select(:email).group(:email).having('COUNT(email) > 1')
  User Load (7801.4ms)  SELECT email FROM "users" GROUP BY email HAVING COUNT(email) > 1
  EXPLAIN (0.4ms)  EXPLAIN SELECT email FROM "users" GROUP BY email HAVING COUNT(email) > 1
EXPLAIN for: SELECT email FROM "users"  GROUP BY email HAVING COUNT(email) > 1
                                 QUERY PLAN
-----------------------------------------------------------------------------
 GroupAggregate  (cost=676876.34..739393.66 rows=3125866 width=22)
   Filter: (count(email) > 1)
   ->  Sort  (cost=676876.34..684691.01 rows=3125866 width=22)
         Sort Key: email
         ->  Seq Scan on users  (cost=0.00..147342.66 rows=3125866 width=22)
(5 rows)

 => [] 

更新另外,如果我尝试 Dave 的解决方案,它也不起作用。

2.0.0p247 :004 > User.select('email, count(email)').group('email').having('count(email) > 1')
  User Load (7858.0ms)  SELECT email, count(email) FROM "users" GROUP BY email HAVING count(email) > 1
  EXPLAIN (0.4ms)  EXPLAIN SELECT email, count(email) FROM "users" GROUP BY email HAVING count(email) > 1
EXPLAIN for: SELECT email, count(email) FROM "users"  GROUP BY email HAVING count(email) > 1
                                 QUERY PLAN
-----------------------------------------------------------------------------
 GroupAggregate  (cost=676876.34..747208.33 rows=3125866 width=22)
   Filter: (count(email) > 1)
   ->  Sort  (cost=676876.34..684691.01 rows=3125866 width=22)
         Sort Key: email
         ->  Seq Scan on users  (cost=0.00..147342.66 rows=3125866 width=22)
(5 rows)
4

2 回答 2

4

那这个呢?

User.select('email, count(email)').group('email').having('count(email) > 1')
于 2013-07-15T20:50:48.953 回答
2

你可以这样做:

duplicates = User.where(email: User.pluck(:email).detect{ |e| User.pluck(:email).count(e) > 1 })

然而,这是在内存中,并且可能会变慢。使用活动记录可能有更好的方法来做到这一点,但我无法通过快速谷歌找到它。

于 2013-07-15T20:52:08.827 回答