1


I have a table with ~2.8 million rows, 3 columns. Each row represents a marketing-touch from the company and has "customer_id", "marketing_type", and "week_num" There is an index on "customer_id" and another index on "marketing_type"

An example of the data:

72, catalog,  7
72, email,    3
99, catalog, 13
82, catalog,  7

I need a list of all customer_id's that had an email, but didn't have a catalog. (there are other types of marketing, and there are customer_id's that didn't get anything)

First try:

SELECT DISTINCT cust_id
FROM marketing_campaign
WHERE marketing_type = 'email'
AND cust_id NOT IN (
 SELECT cust_id
 FROM marketing_campagin
 WHERE marketing_type = 'catalog'
 )
;

this query takes 30+ minutes to run


Second Try:

SELECT m1.cust_id 
FROM marketing_campaign m1
LEFT OUTER JOIN marketing_campaign m2 
  ON m1.cust_id = m2.cust_id 
 AND m2.MARKETING_TYPE = 'catalog'
WHERE m1.MARKETING_TYPE = 'email'
 AND m2.cust_id IS NULL
;

This query executes in 3.8 seconds, but fetches for 30+ minutes.


Third Try:

SELECT distinct cust_id
FROM   marketing_campaign a
WHERE  MARKETING_TYPE = 'email'
  AND  NOT EXISTS (
           SELECT 'X'
           FROM   marketing_campaign b
           WHERE  a.cust_id = b.cust_id
           AND    MARKETING_TYPE = 'catalog' 
           )
ORDER BY cust_id
;

This query also executes in <5 seconds, but then fetches for 20+ minutes.


Can anyone suggest an alternative?

4

1 回答 1

2

不要忽视复合索引:

ALTER TABLE marketing_campaign ADD KEY (marketing_type, cust_id);

然后使用查询#2。

还要确保您已将缓冲区调整到足够大,以便索引驻留在 RAM 中。

于 2013-05-10T15:13:27.070 回答