1

I am trying to develop a query that will delete all but the most recently added row in a database. This is based on a Timestamp field, that is stored as a string and a User ID field that is stored as a string..

table.Timestamp -> text field
table.Retrieving_User -> text field

This is the query I have developed. We have around 50K records in this database and it runs very slowly. I hope its not because of the string to date conversion that I'm doing, because this needs to be done.

DELETE 
FROM `table` main
WHERE (main.Retrieving_User, STR_To_DATE( main.Timestamp , '%a %b %d %H:%i:%s CST %Y' )) NOT IN 
    (SELECT  sub.Retrieving_User, MAX( STR_To_DATE( sub.Timestamp , '%a %b %d %H:%i:%s CST %Y' )) 
    FROM `table` sub
    WHERE sub.Retrieving_User = 'userID'
    GROUP BY sub.Retrieving_User )
AND main.Retrieving_User = 'userID'

Does anyone know of a more efficient way of doing what I'm trying to do?

4

3 回答 3

1

Something like this might work faster because it does not use the IN statement that might be looping again and again over an in memory table. Backup and try

DELETE 
FROM `table` main
WHERE STR_To_DATE( main.Timestamp , '%a %b %d %H:%i:%s CST %Y' )<
  (SELECT  MAX( STR_To_DATE( sub.Timestamp , '%a %b %d %H:%i:%s CST %Y' ) 
   FROM `table` sub
   WHERE sub.Retrieving_User = main.Retrieving_User )
AND main.Retrieving_User = 'userID'
于 2012-08-29T13:19:57.487 回答
1

Whenever you're deleting many rows and the number of rows that you are keeping is a much smaller portion than those you are deleting, this trick from MySQL documentation works really well:

If you are deleting many rows from a large table, you may exceed the lock table size for an InnoDB table. To avoid this problem, or simply to minimize the time that the table remains locked, the following strategy (which does not use DELETE at all) might be helpful:

Select the rows not to be deleted into an empty table that has the same structure as the original table:

INSERT INTO t_copy SELECT * FROM t WHERE ... ;

Use RENAME TABLE to atomically move the original table out of the way and rename the copy to the original name:

RENAME TABLE t TO t_old, t_copy TO t;

Drop the original table:

DROP TABLE t_old;

Another method to improve delete time with MyISAM is to use DELETE QUICK and then OPTIMIZE TABLE afterward, also from MySQL documentation:

If you are going to delete many rows from a table, it might be faster to use DELETE QUICK followed by OPTIMIZE TABLE. This rebuilds the index rather than performing many index block merge operations.

Here's IvoTops answer optimized. We simply convert the date back to a string so we don't have to do the conversion again in the outer query:

DELETE 
FROM `table` main
WHERE main.Timestamp <>
  (SELECT DATE_FORMAT(MAX(STR_To_DATE( sub.Timestamp , '%a %b %d %H:%i:%s CST %Y'), '%a %b %d %H:%i:%s CST %Y') 
   FROM `table` sub
   WHERE sub.Retrieving_User = main.Retrieving_User )
AND main.Retrieving_User = 'userID'
于 2012-08-29T13:41:48.663 回答
0

I think your performance issue is linked to the NOT IN statement. You'd probaly be better off with

DELETE `table`
FROM `table` main,
     (SELECT  sub.Retrieving_User, MAX( STR_To_DATE( sub.Timestamp , '%a %b %d %H:%i:%s CST %Y' )) maxTime
    WHERE sub.Retrieving_User = 'userID'
    GROUP BY sub.Retrieving_User) sub
WHERE STR_To_DATE( main.Timestamp , '%a %b %d %H:%i:%s CST %Y' ) < sub.maxTime
  AND main.Retrieving_User = 'userID';
于 2012-08-29T13:21:43.350 回答