-5

有什么我想念的吗?

我要创建的基本上是一个由空格(或您喜欢的任何类型)分隔的索引表。我意识到仅在 int 类型的数据列上不可能进行全文搜索,因为它将“空格”理解为要在整个目录中索引的分隔数据的分隔符。

我确实意识到它确实允许我索引varbinary类型数据,但为什么不只是int用空格分隔的数据,而不是包含整数和文本数据来搜索。即,一个

 SEARCH * FROM MyTable
 WHERE CONTAINS(indexedcolumn, '1189')

为如下表定义的全文索引/目录:

 indexedColumn      secondDelimitedIntColumn
 1189               34 34209 1989 3 5

是不可能的,但是

 SEARCH * FROM MyTable
 WHERE CONTAINS(uniqueColumn, 'a1189')

将在具有以下列的表上使用全文索引:

 uniqueColumn secondDelimitedIntColumn
 a1189        b34 b34209 b1989 b3 b5  

所以基本上对任何带有全文索引的列执行CONTAINS()搜索只有在整数字符串上附加了一些文本时才会起作用。

但是我的问题是问“为什么我不能只使用由空格分隔的整数字符串,这样我就不必添加虚拟文本来欺骗 SQL Server 以允许我对索引整数字符串执行全文搜索?”

提前致谢!

4

3 回答 3

6

这真的不是一个问题。没有关于您尝试运行的查询或运行它的架构的详细信息。我不确定在这里要告诉你什么。如果有一些可用的详细信息,我可能会为您提供帮助。这更像是你有一个抱怨而不是一个问题。

我完全知道这应该在评论部分而不是回答,但我没有关于溢出的要点。我住在.dba。

于 2012-10-18T20:55:38.780 回答
5

Updated with XML example, below

Your current design violates 1st normal form.

That, in itself, is okay. Over some years, I've inherited and had to maintain several systems that did so. I don't know why they were built that way. It doesn't really matter. They had to be maintained and the schedule wasn't always such that there was time for refactoring, testing and validation, not to mention doing so for the stack of apps that were built upon them.

Looking back now, though, I can easily spot the one attribute that they all shared. It was the absolute biggest barrier to optimizing and extending these systems: the underlying "relational" database violated 1st normal form. Virtually every technical "gotcha" encountered, virtually every performance problem, it was the root cause. Splitting strings. Creating a faux datatype system to validate them. Creating further delimited attributes to describe them. Creating special rules for each delimited "location" and having to implement an EVAL function in many systems to enforce them. Using dynamic SQL or worse to search it all. It took more "clever" programming to implement what seemed like conceptually simple features than I care to recollect.

Maybe your system is different. Maybe 40+ years of relational database research does not apply to your situation. For your sake, I truly hope so. The only problem is that you're using a relational database in a non-relational way. Just like you can pound screws with a hammer, and you can pull a boat with a motorcycle (don't hit the brakes if you actually get it going), you can create an index (full-text or b-tree) on text that represents integers.

But why would you do any of these things? Why wouldn't you actually store the integers as integers and enjoy type-safety? Why wouldn't you normalize this into two related tables to take advantage of smaller transactions and more indexing options? If you've inherited a system that you can't change, then please say so and people might be able to help with alternatives (TVPs and XML been rightfully mentioned). But I can't see coming into the situation saying that your hammer and motorcycle are broken because they don't drive screws and pull boats very well.

All that said (maybe somebody, somewhere is rethinking an ill-advised design), I've put LIKE to good use when searching delimited strings:

-- Setup demo data
declare @delimitedInts table (
    data varchar(max) not null
)
insert into @delimitedInts select '0,1,2'
insert into @delimitedInts select '1,2,3,4'
insert into @delimitedInts select '5,10'

-- Create a search term
declare @searchTerm int = 2

-- Get all rows that contain the searchTerm
select data
from @delimitedInts
where ',' + data + ',' like '%,' + cast(@searchTerm as varchar(11)) + ',%'

-- Create many search terms
declare @searchTerms table (
    searchTerm int not null primary key
)
insert into @searchTerms select 2
insert into @searchTerms select 3
insert into @searchTerms select 4

-- Get all rows that contain ANY of the searchTerms
select distinct a.data
from @delimitedInts a
    join @searchTerms b on ',' + a.data + ',' like '%,' + cast(b.searchTerm as varchar(11)) + ',%'

-- Get all rows that contain ALL of the searchTerms
select a.data
from @delimitedInts a
    join @searchTerms b on ',' + a.data + ',' like '%,' + cast(b.searchTerm as varchar(11)) + ',%'
group by a.data
having count(*) = (select count(*) from @searchTerms)

Is this too slow for you? Maybe. Have you actually measured it? At least you could get an implementation in place and prove that it works before you optimize it.

Update: XML

I've done a little testing on converting your space-delimited column to an XML column and querying it, including doing so with XML indexes. Unfortunately, you can't put an XML index on a computed column, so I'm using a trigger to keep an XML column automatically updated. Here are some interesting results (note the SQL comments):

-- Create a demo table
create table MyTable (
      ID int not null primary key identity
    , SpaceSeparatedInts varchar(max) not null
    --, ComputedIntsXml as cast('<ints><i>' + replace(SpaceSeparatedInts, ' ', '</i><i>') + '</i></ints>' as xml) persisted -- Can't use XML index
    , IntsXml xml null
)
go
-- Create trigger to update IntsXml
create trigger MyTable_Trigger on MyTable after insert, update as begin
    update m
    set m.IntsXml = cast('<ints><i>' + replace(m.SpaceSeparatedInts, ' ', '</i><i>') + '</i></ints>' as xml)
    from MyTable m
        join inserted i on m.ID = i.ID
end
go
-- Add some demo data
insert into MyTable (SpaceSeparatedInts) select '1'
insert into MyTable (SpaceSeparatedInts) select '1 2'
insert into MyTable (SpaceSeparatedInts) select '2 3 4'
insert into MyTable (SpaceSeparatedInts) select '5 6 7 10'
insert into MyTable (SpaceSeparatedInts) select '100 10 1000'
go

-- Search for the number 10 (and use this same query in subsequent testing, below)
select *
from MyTable
where IntsXml.exist('/ints/i[. = "10"]') = 1
-- This query spends virtually all of its time running an XML Reader and an XPath filter

-- Add a primary xml index
create primary xml index IX_MyTable_IntsXml on MyTable (IntsXml)
-- The query now uses a clustered index scan and clustered index seek on PrimaryXML

-- Add secondary xml index for value
create xml index IX_MyTable_IntsXml_Value on MyTable (IntsXml) using xml index IX_MyTable_IntsXml for value
-- No change

-- Add secondary xml index for path
create xml index IX_MyTable_IntsXml_Path on MyTable (IntsXml) using xml index IX_MyTable_IntsXml for path
-- No change

-- Add secondary xml index for property
create xml index IX_MyTable_IntsXml_Property on MyTable (IntsXml) using xml index IX_MyTable_IntsXml for property
-- The query now replaces the clustered index scan on PrimaryXML with an index seek on SecondaryXML

While it is clearly a different method, is this faster than LIKE? You have to test in your environment. Hopefully this will give you some ideas of how to do so. Please let me know how this works out for you, if it's doable in your shop.

于 2012-10-18T21:43:30.083 回答
1

我也不确定我是否理解您要查找的内容,但是如果您想在单个列中存储多个值,最好的选择是使用 XML。

有关该概念的更多信息,请参阅此帖子。

在 SQLServer 2005 中查询 XML 列

于 2012-10-18T22:33:28.227 回答