0

I want a split function in SQL server. I came across this thread: Cannot find either column "dbo" or the user-defined function or aggregate "dbo.Splitfn", or the name is ambiguous

and I feel it is doing too many calculations using index etc. I wrote this function:

ALTER FUNCTION [dbo].[Split]
(
    @Data   varchar(8000),
    @Delimter   char(1) = ','
)
RETURNS @RetVal TABLE 
(
    Data    varchar(8000)
)
AS
Begin
    Set @Data = RTrim(Ltrim(IsNull(@Data,'')))
    Set @Delimter = IsNull(@Delimter,',')

    If Substring(@Data,Len(@Data),1) <> @Delimter
    Begin
        Set @Data = @Data + @Delimter
    End

    Declare @Len int = Len(@Data)
    Declare @index int = 1
    Declare @Char char(1) = ''
    Declare @part varchar(8000) = ''

    While @index <= @Len
    Begin

        Set @Char = Substring(@Data,@index,1)       
        If @Char = @Delimter And @part <> ''
        Begin
            Insert into @RetVal Values (@part)      
            Set @part = ''
        End
        Else
        Begin
            Set @part = @part + @Char
        End

        Set @index = @index + 1
    End

    RETURN;
End

Can anybody comment which one is efficient? I will be using this function too much for splitting the data for one of my scrapping application and I want this to be efficient. Also please mention how did you measure it's efficiency.

4

4 回答 4

2

For some discussions on different string splitting methods and their efficiency, I tend to try to get people to stop trying to do this in T-SQL. You can spend hours fighting with inefficient functions to try and squeeze a few extra microseconds out of them, but it's an exercise in futility. T-SQL is inherently slow at this task and it's much better to go outside of T-SQL - either by using CLR (2005) or Table-Valued Parameters (TVPs) (2008+). I recently published a three-part series on this that is likely worth a read, and I suspect you'll come to the same conclusions I did (CLR is good, TVPs are better, and all T-SQL methods just look silly in comparison):

http://www.sqlperformance.com/2012/07/t-sql-queries/split-strings

http://www.sqlperformance.com/2012/08/t-sql-queries/splitting-strings-follow-up

http://www.sqlperformance.com/2012/08/t-sql-queries/splitting-strings-now-with-less-t-sql

Also please mention how did you measure it's efficiency.

Well, you can do what I did in those articles, select SYSDATETIME() before and after you run each test, and then calculate the difference. You can also log to a table before and after each test, or use Profiler to capture , or surround your test with:

SET STATISTICS TIME ON;

PRINT 'Test 1';

-- do test 1 here

PRINT 'Test 2';

-- do test 2 here

SET STATISTICS TIME OFF;

You'll get output in the messages pane like:

Test 1

SQL Server execution times:
  CPU time: 247 ms, elapsed time: 345 ms

Test 2

SQL Server execution times:
  CPU time: 332 ms, elapsed time: 421 ms

Finally, you can use our free tool, SQL Sentry Plan Explorer. (Disclaimer: I work for SQL Sentry.)

You can feed any query into Plan Explorer, generate an actual plan, and in addition to a graphical plan that is much more readable than the showplan put out my Management Studio, you also get runtime metrics such as duration, CPU and reads. So you can run two queries and compare them side by side without doing any of the above:

enter image description here

于 2012-08-23T15:33:23.163 回答
2

另一种不同的方法:

CREATE FUNCTION [dbo].[fGetTableFromList]
(
    @list VARCHAR(max), @delimiter VARCHAR(10)
)
RETURNS @table TABLE
(value VARCHAR(8000)) AS
BEGIN

DECLARE @list1 VARCHAR(8000), @pos INT, @rList VARCHAR(MAX);

SET @list = LTRIM(RTRIM(@list)) + @delimiter
SET @pos = CHARINDEX(@delimiter, @list, 1)

WHILE @pos > 0
    BEGIN
        SET @list1 = LTRIM(RTRIM(LEFT(@list, @pos - 1)))

        IF @list1 <> ''
            INSERT INTO @table(value) VALUES (@list1)

        SET @list = SUBSTRING(@list, @pos+1, LEN(@list))
        SET @pos = CHARINDEX(@delimiter, @list, 1)
    END
RETURN 
END

dbo.SplitString在 CPU 时间上, ,dbo.Split和我的没有太大区别dbo.fGetTableFromList。我通过执行这个知道这一点:

SET STATISTICS TIME ON;

SELECT * FROM [dbo].[Split]('Lorem ipsum dolor sit amet,...', ' ');
SELECT * FROM [dbo].[SplitString]('Lorem ipsum dolor sit amet,...', ' ');
SELECT * FROM [dbo].[fGetTableFromList]('Lorem ipsum dolor sit amet,...', ' ');

SET STATISTICS TIME OFF;

当然,尽可能多的时间执行记录,使用不同的输入进行测试,你会更准确地了解哪个函数执行得更好。

您还必须注意执行计划。删除SET STATISTICS句子并执行上面的三个查询,并告诉 SMSS 向您显示执行计划。

只需查看“执行计划”选项卡中提供的摘要,而不检查其详细信息,您就可以看到第一个,Split,正在花费预期工作的 13%,第二个,SplitString一个 60%,第三个,fGetTableFromList,又是 13%(其余的工作由SELECTs 花费)。

这是“虚拟”方式,不适用于 DBA。如果您需要准确或精确的基准测试,您应该尝试编写一些压力测试并提取简洁的结果(如@AaronBertrand 提供的链接中所示)。

于 2012-08-23T15:38:10.600 回答
0

@AnandPhadke,

我不明白你在用 CTE 做什么。这工作得很好:

Create function dbo.SplitString(@inputStr varchar(1000),@del varchar(5))
RETURNS @table TABLE(col varchar(100))
As
BEGIN

DECLARE @t table(col1 varchar(100))
INSERT INTO @t
select @inputStr

if CHARINDEX(@del,@inputStr,1) > 0
BEGIN
    ;WITH CTE1 as (
    select ltrim(rtrim(LEFT(col1,CHARINDEX(@del,col1,1)-1))) as col,RIGHT(col1,LEN(col1)-CHARINDEX(@del,col1,1)) as rem from @t
    union all
    select ltrim(rtrim(LEFT(rem,CHARINDEX(@del,rem,1)-1))) as col,RIGHT(rem,LEN(rem)-CHARINDEX(@del,rem,1))
    from CTE1 c
    where CHARINDEX(@del,rem,1)>0
    )

        INSERT INTO @table 
        select col from CTE1
        union all
        select rem from CTE1 where CHARINDEX(@del,rem,1)=0
    END
ELSE
BEGIN
    INSERT INTO @table 
    select col1 from @t
END

RETURN

END
于 2012-08-23T14:04:02.667 回答
0

尝试这个:

CREATE function dbo.SplitString(@inputStr varchar(1000),@del varchar(5))
RETURNS @table TABLE(col varchar(100))
As
BEGIN

DECLARE @t table(col1 varchar(100))
INSERT INTO @t
select @inputStr

if CHARINDEX(@del,@inputStr,1) > 0
BEGIN
;WITH CTE as(select ROW_NUMBER() over (order by (select 0)) as id,* from @t)
,CTE1 as (
select id,ltrim(rtrim(LEFT(col1,CHARINDEX(@del,col1,1)-1))) as col,RIGHT(col1,LEN(col1)-CHARINDEX(@del,col1,1)) as rem from CTE
union all
select c.id,ltrim(rtrim(LEFT(rem,CHARINDEX(@del,rem,1)-1))) as col,RIGHT(rem,LEN(rem)-CHARINDEX(@del,rem,1))
from CTE1 c
where CHARINDEX(@del,rem,1)>0
)

INSERT INTO @table 
select col from CTE1
union all
select rem from CTE1 where CHARINDEX(@del,rem,1)=0
END
ELSE
BEGIN
INSERT INTO @table 
select col1 from @t
END


RETURN

END
于 2012-08-23T13:35:37.497 回答