I have a table with ~2.8 million rows, 3 columns. Each row represents a marketing-touch from the company and has "customer_id", "marketing_type", and "week_num" There is an index on "customer_id" and another index on "marketing_type"
An example of the data:
72, catalog, 7
72, email, 3
99, catalog, 13
82, catalog, 7
I need a list of all customer_id's that had an email, but didn't have a catalog. (there are other types of marketing, and there are customer_id's that didn't get anything)
First try:
SELECT DISTINCT cust_id
FROM marketing_campaign
WHERE marketing_type = 'email'
AND cust_id NOT IN (
SELECT cust_id
FROM marketing_campagin
WHERE marketing_type = 'catalog'
)
;
this query takes 30+ minutes to run
Second Try:
SELECT m1.cust_id
FROM marketing_campaign m1
LEFT OUTER JOIN marketing_campaign m2
ON m1.cust_id = m2.cust_id
AND m2.MARKETING_TYPE = 'catalog'
WHERE m1.MARKETING_TYPE = 'email'
AND m2.cust_id IS NULL
;
This query executes in 3.8 seconds, but fetches for 30+ minutes.
Third Try:
SELECT distinct cust_id
FROM marketing_campaign a
WHERE MARKETING_TYPE = 'email'
AND NOT EXISTS (
SELECT 'X'
FROM marketing_campaign b
WHERE a.cust_id = b.cust_id
AND MARKETING_TYPE = 'catalog'
)
ORDER BY cust_id
;
This query also executes in <5 seconds, but then fetches for 20+ minutes.
Can anyone suggest an alternative?