I presume it takes a long time because (a) you're deleting millions of rows and (b) you are treating your log like a revolving door. This isn't going to magically go from 5-8 minutes to 5 seconds because you use EXISTS instead of IN or change a subquery to a CTE or using a JOIN. Go ahead and try it, I bet it is no better:
DELETE ml
FROM dbo.MailingListTable AS ml
INNER JOIN dbo.ListItems AS li
ON ml.Md4Hash = li.Md5Hash
INNER JOIN dbo.Lists AS l
ON l.Id = li.ListId
WHERE l.IsGlobal = 1;
The problem is almost certainly the I/O involved with performing the DELETE
, not the method used to identify the rows to delete. I bet a SELECT
using the exact same data and without changing index structure etc. and no matter the isolation level does NOT take 5-8 minutes.
So, how to fix?
First, make sure that your log is tuned to handle transactions of that size.
Pre-size the log so that it doesn't ever have to grow during such an operation, perhaps to double whatever the largest size you've seen it. The exact ideal size is not something someone on Stack Overflow is going to be able to tell you.
Make sure auto-growth is not set to silly defaults like 10% or 1MB. Autogrow should be a fallback but, when you need it, it should happen exactly once, not multiple times to cover any specific activity. So make sure it is a fixed size (making the size + duration predictable) and that the size is reasonable (so that it only happens once). What is reasonable? No idea - too many "it depends."
- Disable any jobs that shrink the log - permanently. Deal with
out-of-control log on a case-by-case basis instead of "preventing"
log growth by repeatedly shrinking the log file.
Next, consider changing your query to batch those deletes into chunks. You can play around with the TOP (?)
parameter based on how many rows lead to what kind of duration of transaction (there is no magic formula for this, even if we did have a lot more information).
CREATE TABLE #x
(
Md5Hash SOME_DATA_TYPE_I_DO_NOT_KNOW PRIMARY KEY
);
INSERT #x SELECT DISTINCT li.Md5Hash
FROM dbo.ListItems AS li
INNER JOIN dbo.Lists AS l
ON l.Id = li.ListId
WHERE l.IsGlobal = 1;
DECLARE @p TABLE(p INT SOME_DATA_TYPE_I_DO_NOT_KNOW PRIMARY KEY);
SELECT @rc = 1;
WHILE @rc > 0
BEGIN
DELETE @p;
DELETE TOP (?)
OUTPUT deleted.Md5Hash INTO @p
FROM #x;
SET @rc = @@ROWCOUNT;
BEGIN TRANSACTION;
DELETE ml FROM dbo.MailingListTable AS ml
WHERE EXISTS (SELECT 1 FROM @p WHERE Md5Hash = ml.Md5Hash);
COMMIT TRANSACTION;
-- to minimize log impact you may want to CHECKPOINT
-- or backup the log here, every loop or every N loops
END
This may extend the total amount of time that the operation takes (especially if you backup or checkpoint on each loop, or add an artificial delay using WAITFOR
, or both), but should allow other transactions to sneak in between chunks, waiting for shorter transactions instead of the whole process. Also, because you are having less individual impact to the log, it may actually end up finishing a lot faster. But I have to assume that the problem isn't that it takes 5-8 minutes, it's probably that it takes 5-8 minutes and blocks. This should alleviate that considerably (and if it does, why do you care how long it takes?).
I wrote a lot more about this technique here.