I have 2 tables (cannot change them)
Parent (id, date, amount)
Child (parent_id, key, value)
indexes
Parent.pk (id)
Parent.idx1 (id, date) include (amount)
Child.pk (parent_id, key)
Child.idx1 (parent_id, key, value)
and query
select sum(amount)
from Parent as p
left outer join Child as c1 on c1.parent_id = p.id and c1.key = 'X'
left outer join Child as c2 on c2.parent_id = p.id and c2.key = 'Y'
where p.date between '20120101' and '20120131'
and c1.value = 'x1'
and c2.value = 'y1'
Problem is performance.
Parent has ~1 500 000 records and Child ~6 000 000 records
Take 1
This query takes ~3sec which is too much for my scenario - it must be less than few milliseconds.
Execution plan shows me that SQL Server is doing index scan on Parent.idx1
and than merge join with Child.idx1
clustered index seek - which is not optimal because it scans whole 1500000 records even when I filter them by date.
Take 2
When I change Parent.idx1
to
Parent.idx1 (date, id) include (amount)
Sql server chooses Clustered index scan on Parent.pk
and than again merge join with Child.idx1
. Execution time is ~6s.
Take 3
When I force it to use Parent.idx1 (date, id) include (amount)
then it sorts the result before merge join and execution time is even worse ~11s.
Take 4
Tried to create indexed view but cannot use it because of LEFT OUTER JOIN.
Is there any way to make such query - Parent-Child join with filters on both of them - faster?
Without de-normalization.
Update 2013-07-04:
To those answering use INNER JOIN - Yes it's much faster, but I cannot use it.
What I showed here is simplified version of what I really need.
I need to create SQL View for MS Dynamics NAV "G/L Entry" (Parent) and "Ledger Entry Dimension" (Child) tables so that I will be able to read it from that application.
Complete view looks like this right now:
create view analysis
as
select
v.id as view_id
, p.date
, p.Amount
, c1.value as value1
, c2.value as value2
, c3.value as value3
, c4.value as value4
from Parent as p
cross join analysis_view as v
left outer join Child as c1 on c1.parent_id = p.id and c1.key = v.key1
left outer join Child as c2 on c2.parent_id = p.id and c2.key = v.key2
left outer join Child as c3 on c3.parent_id = p.id and c3.key = v.key3
left outer join Child as c4 on c4.parent_id = p.id and c4.key = v.key4
where analysis_view contains 8 records currently and looks like this: analysis_view (id, key1, key2, key3, key4)
and then aplication may query it like this
select sum(amount)
from analysis
where view_id = 1 and date between '20120101' and '20120131'
and value1 = 'x1'
and value2 = 'x2'
or
select sum(amount)
from analysis
where view_id = 1 and date between '20120101' and '20120131'
and value1 = 'x1'
and value3 = 'z1'
MS Dynamics NAV already have de-normalized table for it and queries from it are fast, but it's huge in our case (~10GB) and locks the whole system for around one hour when somone creates new analysis view. Also NAV doesn't know how to produce joins, that's why I must define it on SQL Server side.