1

In the code_list CTE in this query I have a row constructor that will eventually take any number of arguments. The column icd in the patient_codes CTE is a five digit identifier that is most descriptive that the three digit codes that the row constructor has. The table icd_patient has a 100 million rows so for performance's sake, I would like to filer the rows on this table before I do any further work. I have

;with code_list(code_list)
as
(
    select  x.code_list
      from (values ('70700'),('25002')) as x(code_list)
),patient_codes
as
(

select distinct  icd,pat_id,id
    from icd_patient
    where icd in (select icd from code_list)
)
select distinct pat_id from patient_codes

The problem is, however, is that in the icd_patient table all of the icd columns are five digit and more descriptive. If I look at the execution plan of this query it's pretty streamlined. If I do

;with code_list(code_list)
as
(
    select  x.code_list
      from (values ('70700'),('25002')) as x(code_list)
),patient_codes
as
(
    select substring(icd,1,3) as icd,pat_id
      from icd_patient2
      where substring(icd,1,3) in (select * from code_list)
)
select * from patient_codes

this if course has a large performance impact because of the substring expression in the where clause. Does something akin to like in exist so I can take advantage of my indexes?

Index on icd_patient CREATE NONCLUSTERED INDEX [ix_icd_patient] ON [dbo].[icd_patient2] ( [pat_id] ASC ) INCLUDE ( [id],

4

3 回答 3

5

This much simpler query should be better than (or, at worst, the same as) your existing query.

select pat_id
    FROM dbo.icd_patient
    where icd LIKE '707%'
       OR icd LIKE '250%'
GROUP BY pat_id;

Note that sargability only matters if there is actually an index on this column.

An alternative (since OR can sometimes give the optimizer fits):

SELECT pat_id FROM 
(
  SELECT pat_id
    FROM dbo.icd_patient
    WHERE icd LIKE '707%'
  UNION ALL
  SELECT pat_id
    FROM dbo.icd_patient
    WHERE icd LIKE '250%'
) AS x
GROUP BY pat_id;

To make this extensible beyond a handful of OR conditions, I would use a table-valued parameter (TVP).

CREATE TYPE dbo.StringPatterns AS TABLE(s VARCHAR(3) PRIMARY KEY);

Then your stored procedure could say:

CREATE PROCEDURE dbo.whatever
  @sp dbo.StringPatterns READONLY
AS
BEGIN
  SET NOCOUNT ON;

  SELECT p.pat_id
    FROM dbo.icd_patient AS p
    INNER JOIN @sp AS sp
    ON p.pat_id LIKE sp.s + '%'
  GROUP BY p.pat_id;
END

Then you can pass in your set of three-character substrings from a DataTable or other collection in C#. From T-SQL just as an example:

DECLARE @p dbo.StringPatterns;
INSERT @p VALUES('707'),('250');
EXEC dbo.whatever @sp = @p;
于 2013-03-30T20:36:55.417 回答
2

Something like like in does not exist. The following is sargable:

select *
from icd_patient
where icd like '70700%' or
      icd like '25002%'

Because like with a constant initial substring is a special case for SQL Server. This does not work when the strings on the right are variables.

One solution is to create an indexed view on the icd_patient table with an index on the first five characters of the icd code.

于 2013-03-30T20:38:10.580 回答
2

Using "IN" makes that part of a command non-sargable on both sides. End of discussion.

Saying he fixes it using substring, completely changes what it would return while it remains non sarged.

Any "fix" should exactly match results. The actual fix is to join the cte so the five characters match or put three characters in the cte and match that in a join or put 4 characters in the cte where the fourth is "%" and join matching by using LIKE

Using a "like" that starts with "%" increases the complexity of the search, but it would still use the index to find the value because parsing the index should use less reading by only getting the full table row when a search is successful.

于 2014-01-26T03:20:32.073 回答