5

我有一些客户具有用户正在搜索的特定地址:

123 通用方式

数据库中有 5 行匹配:

ResidentialAddress1
=============================
123 GENERIC WAY
123 GENERIC WAY
123 GENERIC WAY
123 GENERIC WAY
123 GENERIC WAY

我运行 FT 查询来查找这些行。当我向搜索添加更多条件时,我将向您展示每个步骤:

SELECT ResidentialAddress1 FROM Patrons
WHERE CONTAINS(Patrons.ResidentialAddress1, '"123*"')

ResidentialAddress1
=========================
123 MAPLE STREET
12345 TEST
123 MINE STREET
123 GENERIC WAY
123 FAKE STREET
...

(30 row(s) affected)

好的,到目前为止一切顺利,现在添加“通用”一词:

SELECT ResidentialAddress1 FROM Patrons
WHERE  CONTAINS(Patrons.ResidentialAddress1, '"123*"')
AND CONTAINS(Patrons.ResidentialAddress1, '"generic*"')

ResidentialAddress1
=============================
123 GENERIC WAY
123 GENERIC WAY
123 GENERIC WAY
123 GENERIC WAY
123 GENERIC WAY

(5 row(s) affected)

优秀的。现在我将添加用户想要确保存在的最终关键字:

SELECT ResidentialAddress1 FROM Patrons
WHERE  CONTAINS(Patrons.ResidentialAddress1, '"123*"')
AND CONTAINS(Patrons.ResidentialAddress1, '"generic*"')
AND CONTAINS(Patrons.ResidentialAddress1, '"way*"')


ResidentialAddress1            
------------------------------ 

(0 row(s) affected)

嗯?没有行?如果我只查询“way*”怎么办:

SELECT ResidentialAddress1 FROM Patrons
WHERE CONTAINS(Patrons.ResidentialAddress1, '"way*"')

ResidentialAddress1            
------------------------------ 

(0 row(s) affected)

起初我认为可能是因为*, 并且它要求根way后面有更多字符。但这不是真的:

  • 搜索“123*”匹配“123”
  • 搜索“generic*”匹配“generic”
  • 在线书籍说,星号匹配零个、一个或多个字符

如果我删除*just for s&g 怎么办:

SELECT ResidentialAddress1 FROM Patrons
WHERE CONTAINS(Patrons.ResidentialAddress1, '"way"')

Server: Msg 7619, Level 16, State 1, Line 1
A clause of the query contained only ignored words. 

因此,有人可能会认为您甚至不允许单独或作为根进行搜索。way但这也不是真的:

SELECT * FROM Patrons
WHERE CONTAINS(Patrons.*, '"way*"')

AccountNumber FirstName Lastname
------------- --------- --------
33589         JOHN      WAYNE                    

综上所述,用户正在搜索包含所有单词的行:

123 通用方式

我正确地翻译成WHERE条款:

SELECT * FROM Patrons
WHERE CONTAINS(Patrons.*, '"123*"')
AND CONTAINS(Patrons.*, '"generic*"')
AND CONTAINS(Patrons.*, '"way*"')

它不返回任何行。告诉我这行不通,这不是我的错,而且 SQL Server 很疯狂。

注意:我已经清空了 FT 索引并重建了它。

更新一

SELECT Lastname, ResidentialAddress1 FROM Patrons
WHERE CONTAINS(Patrons.*, '"gen*"')

Lastname                  ResidentialAddress1            
------------------------- ------------------------------ 
SAVE                      123 GENERIC WAY
Genders                   
SAVE                      123 GENERIC WAY
Patron                    123 GENERIC WAY
SAVE                      123 GENERIC WAY
SAVE                      234 GENERIC WAY
SAVE                      123 GENERIC WAY

(7 row(s) affected)

更新二

假装用户输入:

第 123 章

SELECT ResidentialAddress1 FROM Patrons
WHERE  CONTAINS(Patrons.ResidentialAddress1, '"123*"')
AND CONTAINS(Patrons.ResidentialAddress1, '"generic*"')
AND CONTAINS(Patrons.ResidentialAddress1, '"wa*"')

ResidentialAddress1            
------------------------------ 

(0 row(s) affected)

真正的问题是用户正在输入完全有效的内容,他们希望看到任何人都希望看到的内容。


更新三

有人要求这一切,这不是我的错!:

CREATE TABLE [dbo].[Patrons] (
    [PatronGUID]  uniqueidentifier ROWGUIDCOL  NOT NULL ,
    [AccountNumber] [bigint] NULL ,
    [FirstName] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [MiddleInitial] [varchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [Lastname] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [EyeColor] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [HairColor] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [Gender] [varchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [Birthday] [datetime] NULL ,
    [Height] [int] NULL ,
    [Weight] [int] NULL ,
    [FacialHair] [tinyint] NULL ,
    [Nationality] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [IdentifyingMarks] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [DriversLicenseNumber] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [DriversLicenseRegion] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [DriversLicenseCountry] [varchar] (2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [DriversLicenseExpires] [datetime] NULL ,
    [DriversLicenseDateVerified] [datetime] NULL ,
    [PassportNumber] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [PassportRegion] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [PassportCountry] [varchar] (2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [PassportExpires] [datetime] NULL ,
    [PassportDateVerified] [datetime] NULL ,
    [OtherIdentificationNumber] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [OtherIdentificationRegion] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [OtherIdentificationCountry] [varchar] (2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [OtherIdentificationType] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [OtherIdentificationExpires] [datetime] NULL ,
    [OtherIdentificationDateVerified] [datetime] NULL ,
    [ResidentialAddress1] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [ResidentialAddress2] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [ResidentialAddress3] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [ResidentialCity] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [ResidentialZipCode] [varchar] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [ResidentialRegion] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [ResidentialCountry] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [ResidentialPhoneNumber] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [CountryOfResidence] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [BusinessAddress1] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [BusinessAddress2] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [BusinessAddress3] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [BusinessCity] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [BusinessRegion] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [BusinessZipCode] [varchar] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [BusinessCountry] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [BusinessName] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [BusinessPhone] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [PositionWithFirm] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [EmployerTelephone] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [MemberCardType] [varchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [PlayerStatusCode] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [AccountType] [varchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [AccountStatus1] [varchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [AccountStatus2] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [IsVIPExchangeRate] [tinyint] NULL ,
    [ChangedUserGUID_Depricated] [uniqueidentifier] NULL ,
    [ChangedUser] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [ChangedDate] [datetime] NULL ,
    [ChangedWorkstation] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [PendingUpdates_Depricated] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL 
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[Patrons] ADD 
    CONSTRAINT [DF_Patrons_PatronGUID] DEFAULT (newid()) FOR [PatronGUID],
    CONSTRAINT [PK_Patrons] PRIMARY KEY  NONCLUSTERED 
    (
        [PatronGUID]
    ) WITH  FILLFACTOR = 90  ON [PRIMARY] 
GO

if (select DATABASEPROPERTY(DB_NAME(), N'IsFullTextEnabled')) <> 1 
exec sp_fulltext_database N'enable' 

GO

if not exists (select * from dbo.sysfulltextcatalogs where name = N'TheFullTextCatalog')
exec sp_fulltext_catalog N'TheFullTextCatalog', N'create' 

GO

exec sp_fulltext_table N'[dbo].[Patrons]', N'create', N'TheFullTextCatalog', N'PK_Patrons'
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'FirstName', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'MiddleInitial', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'Lastname', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'EyeColor', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'IdentifyingMarks', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'ResidentialAddress1', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'ResidentialAddress2', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'ResidentialAddress3', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'ResidentialCity', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'ResidentialZipCode', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'ResidentialRegion', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'ResidentialCountry', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'ResidentialPhoneNumber', N'add', 1033  
GO

exec sp_fulltext_column N'[dbo].[Patrons]', N'CountryOfResidence', N'add', 1033  
GO

exec sp_fulltext_table N'[dbo].[Patrons]', N'activate'  
GO

这是不相信我的人的屏幕截图:

应该有效但无效的查询:

在此处输入图像描述

有效但无用的查询:

在此处输入图像描述

有效但无用的查询,带有证明内容:

在此处输入图像描述


更新四

查询不能写成

CONTAINS(Patrons.*, 'words...')

因为有些项目在逻辑上或物理上没有被 FT 索引覆盖。例如,用户查询:

6/4/2010 伊恩博伊德 619

提出四个关键词:

  • 2010 年 6 月 4 日
  • 伊恩
  • 博伊德
  • 619

这意味着他们希望所有条件都成立,伪代码为:

WHERE 6/4/2010 is in the row
AND ian is in the row
AND boyd is in the row
AND 619 is in the row

这被翻译成部分查询:

WHERE --Keyword 1: 6/4/2010
(
   ((ChangedDate >= '20100604') AND (ChangedDate < '20100605'))
   OR 
   ((LastTransactionDate >= '20100604') AND (LastTransactionDate < '20100605'))
   OR 
   (CONTAINS(Patrons.*, '"6/4/2010*"')
)
AND --Keyword 2: ian
(
    CONTAINS(Patrons.*, '"ian*"')
)
AND --Keyword 3: boyd
(
    CONTAINS(Patrons.*, '"boyd*"')
)
AND --Keyword 4: 619
(
    (AccountNumber IN (SELECT CAST(619 AS bigint)))
    OR
    (CONTAINS(Patrons.*, '"619*"'))
)

其中一位回答者正在查看原始问题中提供的简化示例;不是现实世界。要说有多个从句是不正确的,那是不正确的。AND

4

5 回答 5

6

该消息告诉您“方式”是一个停用词,这意味着它被忽略且未编入索引。这就是为什么您可以找到“wayne”但找不到“way”的原因。

所以,不,这不是疯了,你也不是。这只是一个简单的误解。

于 2010-06-03T13:31:35.247 回答
4

您可能在创建 FT 索引时使用了系统停止列表。这个词way恰好在那里。您可以使用以下查询查看它:

SELECT *
FROM sys.fulltext_system_stopwords
WHERE stopword = 'way'
AND language_id = 1033

您可以关闭停止列表或创建自定义停止列表,但更好的解决方案是正确编写查询;不要使用多个WHERE CONTAINS子句,将它们合并为一个。否则 SQL Server 可能无法有效地使用 FT 索引。

您的查询应如下所示:

SELECT ResidentialAddress1 FROM Patrons
WHERE  CONTAINS(Patrons.ResidentialAddress1, '"123*" AND "generic*" AND "way*"')

如果你这样做,停用词就会被忽略;如果您没有包含 term ,它仍然会返回所有相同的结果way*


编辑:刚刚注意到您标记了 this sql-server-2000,所以第一个查询可能不起作用。在 SQL 2000 中,它们是“干扰词”,我相信配置是全局的,您没有单独的停止列表。尽管如此,如果您编写一个WHERE CONTAINS子句而不是多个子句,您仍然会得到结果。

要在 SQL Server 2000 中编辑干扰词,您必须在 SQL ServerFTDATA配置文件夹中编辑特定于语言的文件。更多详细信息在这里:SQL Server 全文搜索噪声词和同义词库配置

于 2010-06-03T13:53:50.447 回答
3

解决方案1:

您想尝试转换噪声词 选项(SQL 2008)

关闭此功能,应停止删除单词。

例子:

sp_configure 'show advanced options', 1
RECONFIGURE
GO
sp_configure 'transform noise words', 1
RECONFIGURE
GO

编辑1:

希望旧版本的 MS SQL 可能有类似的东西?

于 2010-06-03T13:49:24.557 回答
2

http://www.codinghorror.com/blog/2008/11/stop-me-if-you-think-youve-seen-this-word-before.html

在那篇文章中,杰夫提到你可以关闭它们。

于 2010-06-03T13:40:11.197 回答
0

也许它需要三个以上的字母字符。尝试另一个三个字母的单词,例如gen*.

于 2010-06-03T13:32:47.810 回答