sql - 优化sql函数获取常用元素

Question

我有一个函数，它接受两个分隔字符串并返回公共元素的数量。这

函数的主要代码是（@intCount 是预期的返回值）

    SET @commonCount = (select count(*) from (
    select token from dbo.splitString(@userKeywords, ';')
    intersect
    select token from dbo.splitString(@itemKeywords, ';')) as total)

其中 splitString 使用 while 循环和 charIndex 将字符串拆分为分隔标记并将其插入表中。

我遇到的问题是，这仅以每秒约 100 行的速度处理，并且根据我的数据集的大小，这将需要大约 8-10 天才能完成。

两个字符串的长度最多可达 1500 个字符。

无论如何，我是否可以足够快地实现这一目标以供使用？

score 1 · Accepted Answer

性能问题可能是游标（用于 while 循环）和用户定义函数的组合。

如果这些字符串之一是常量（例如项目关键字），您可以独立搜索每个字符串：

select *
from users u
where charindex(';'+<item1>+';', ';'+u.keywords) > 0
union all
select *
from users u
where charindex(';'+<item2>+';', ';'+u.keywords) > 0 union all

或者，可以使用基于集合的方法，但您必须对数据进行规范化（插入此处以获得正确格式的数据）。也就是说，您需要一个包含以下内容的表：

userid
keyword

另一个有

itemid
keyword

（如果有不同类型的项目。否则这只是一个关键字列表。）

然后您的查询将如下所示：

select *
from userkeyword uk join
     itemkeyword ik
     on uk.keyword = ik.keyword

SQL 引擎将发挥它的魔力。

现在，您如何创建这样的列表？如果每个用户只有几个关键词，那么您可以执行以下操作：

with keyword1 as (select u.*, charindex(';', keywords) as pos1,
                         left(keywords, charindex(';', keywords)-1) as keyword1
                  from user u
                  where charindex(';', keywords) > 0
                 ),
     keyword2 as (select u.*, charindex(';', keywords, pos1+1) as pos2,
                         left(keywords, charindex(';', keywords)-1, pos1+1) as keyword2
                  from user u
                  where charindex(';', keywords, pos1+2) > 0
                 ),
        ...
select userid, keyword1
from keyword1
union all
select userid, keyword2
from keyword2
...

要获取 itemKeyWords 中的最大元素数，可以使用以下查询：

select max(len(Keywords) - len(replace(Keywords, ';', '')))
from user

sql - 优化sql函数获取常用元素

1 回答 1

Related

Reference