编辑 - 为了清楚起见,我会将目标句放在顶部。测试和我的问题是是否有办法在不使用临时表的情况下获得与临时表相同的性能。
我觉得这应该是一个简单的问题,但我被困住了。我正在 SQL2014 中试验 FileTables。我知道一些可行的替代方案,但目标是确定从文件表中提取文本子字符串的可行性。
该测试有 35,000 个文本文件,其中一行文本如下,每个文件平均有 100 个字节的非 unicode 文本。
Aaa|Bbb|Ccc|Ddd|Eee|Fff|Ggg
所需的输出是每个文件的一行,并将分隔字符串分成七列。
我找到了一个快速的字符串解析器函数,但与 varchar 列相比,在文件流上运行对性能有显着影响。
此查询需要 18 秒才能运行。我试图让从文件流到 varchar 的转换只执行一次,但我认为调用 UDF 可能会导致它发生在每一行(文件)。
Create View vAddresses As
Select file_type, Convert(Varchar(8000),file_stream) TextData /* Into #Temp */ From InputFiles Where file_type = 'adr'
Go
Select --TextData,
dbo.udf_StringSplit(TextData, 1, '|'), dbo.udf_StringSplit(TextData, 2, '|'), dbo.udf_StringSplit(TextData, 3, '|'),
dbo.udf_StringSplit(TextData, 4, '|'), dbo.udf_StringSplit(TextData, 5, '|'), dbo.udf_StringSplit(TextData, 6, '|'),
dbo.udf_StringSplit(TextData, 7, '|')--, TextData
From vAddresses
我已经尝试将其作为视图、cte 和子查询。唯一似乎有帮助的是创建一个临时表。创建临时表需要 1 秒,查询需要 1 秒。因此,对于 35k 行,总查询时间为 2 秒,而总查询时间为 18 秒。
Drop Table #Temp
(Select file_type, Convert(Varchar(8000),file_stream) TextData Into #Temp From HumanaInputFiles Where file_type = 'adr')
Select --TextData,
dbo.udf_StringSplit(TextData, 1, '|'), dbo.udf_StringSplit(TextData, 2, '|'), dbo.udf_StringSplit(TextData, 3, '|'),
dbo.udf_StringSplit(TextData, 4, '|'), dbo.udf_StringSplit(TextData, 5, '|'), dbo.udf_StringSplit(TextData, 6, '|'),
dbo.udf_StringSplit(TextData, 7, '|')--, TextData
From #Temp
我已经阅读了很多关于文件表和临时表与单个查询性能主题的文章和博客,但我似乎无法弄清楚。它可能与 sargable 或统计有关?非常感谢任何建议。
这是 UDF,我在 MSDN 博客/论坛上找到了它,它是迄今为止我发现的表现最好的。
ALTER FUNCTION [dbo].[udf_StringSplit](
@TEXT varchar(8000)
,@COLUMN tinyint
,@SEPARATOR char(1)
)RETURNS varchar(8000)
AS
BEGIN
DECLARE @POS_START int = 1
DECLARE @POS_END int = CHARINDEX(@SEPARATOR, @TEXT, @POS_START)
WHILE (@COLUMN >1 AND @POS_END> 0)
BEGIN
SET @POS_START = @POS_END + 1
SET @POS_END = CHARINDEX(@SEPARATOR, @TEXT, @POS_START)
SET @COLUMN = @COLUMN - 1
END
IF @COLUMN > 1 SET @POS_START = LEN(@TEXT) + 1
IF @POS_END = 0 SET @POS_END = LEN(@TEXT) + 1
RETURN SUBSTRING (@TEXT, @POS_START, @POS_END - @POS_START)
END
这是临时表的执行计划。
<?xml version="1.0" encoding="utf-16"?>
<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.2" Build="12.0.4100.1" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">
<BatchSequence>
<Batch>
<Statements>
<StmtSimple StatementCompId="1" StatementEstRows="17486" StatementId="1" StatementOptmLevel="TRIVIAL" CardinalityEstimationModelVersion="120" StatementSubTreeCost="0.166487" StatementText="Select --TextData,
 dbo.udf_StringSplit(TextData, 1, '|'), dbo.udf_StringSplit(TextData, 2, '|'), dbo.udf_StringSplit(TextData, 3, '|'),
 dbo.udf_StringSplit(TextData, 4, '|'), dbo.udf_StringSplit(TextData, 5, '|'), dbo.udf_StringSplit(TextData, 6, '|'),
 dbo.udf_StringSplit(TextData, 7, '|')--, TextData
 From #Temp" StatementType="SELECT" QueryHash="0xC4D6F0215D332F3D" QueryPlanHash="0xC50CFAF9494B5DBE" RetrievedFromCache="true">
<StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />
<QueryPlan DegreeOfParallelism="0" NonParallelPlanReason="CouldNotGenerateValidParallelPlan" CachedPlanSize="24" CompileTime="1" CompileCPU="1" CompileMemory="168">
<MemoryGrantInfo SerialRequiredMemory="0" SerialDesiredMemory="0" />
<OptimizerHardwareDependentProperties EstimatedAvailableMemoryGrant="838735" EstimatedPagesCached="419367" EstimatedAvailableDegreeOfParallelism="4" />
<RelOp AvgRowSize="28023" EstimateCPU="0.0017486" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="17486" LogicalOp="Compute Scalar" NodeId="0" Parallel="false" PhysicalOp="Compute Scalar" EstimatedTotalSubtreeCost="0.166487">
<OutputList>
<ColumnReference Column="Expr1003" />
<ColumnReference Column="Expr1004" />
<ColumnReference Column="Expr1005" />
<ColumnReference Column="Expr1006" />
<ColumnReference Column="Expr1007" />
<ColumnReference Column="Expr1008" />
<ColumnReference Column="Expr1009" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="17486" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>
<ComputeScalar>
<DefinedValues>
<DefinedValue>
<ColumnReference Column="Expr1003" />
<ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([tempdb].[dbo].[#Temp].[TextData],(1),'|')">
<UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[tempdb]" Schema="[dbo]" Table="[#Temp]" Column="TextData" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(1)" />
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="'|'" />
</ScalarOperator>
</UserDefinedFunction>
</ScalarOperator>
</DefinedValue>
<DefinedValue>
<ColumnReference Column="Expr1004" />
<ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([tempdb].[dbo].[#Temp].[TextData],(2),'|')">
<UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[tempdb]" Schema="[dbo]" Table="[#Temp]" Column="TextData" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(2)" />
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="'|'" />
</ScalarOperator>
</UserDefinedFunction>
</ScalarOperator>
</DefinedValue>
<DefinedValue>
<ColumnReference Column="Expr1005" />
<ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([tempdb].[dbo].[#Temp].[TextData],(3),'|')">
<UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[tempdb]" Schema="[dbo]" Table="[#Temp]" Column="TextData" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(3)" />
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="'|'" />
</ScalarOperator>
</UserDefinedFunction>
</ScalarOperator>
</DefinedValue>
<DefinedValue>
<ColumnReference Column="Expr1006" />
<ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([tempdb].[dbo].[#Temp].[TextData],(4),'|')">
<UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[tempdb]" Schema="[dbo]" Table="[#Temp]" Column="TextData" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(4)" />
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="'|'" />
</ScalarOperator>
</UserDefinedFunction>
</ScalarOperator>
</DefinedValue>
<DefinedValue>
<ColumnReference Column="Expr1007" />
<ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([tempdb].[dbo].[#Temp].[TextData],(5),'|')">
<UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[tempdb]" Schema="[dbo]" Table="[#Temp]" Column="TextData" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(5)" />
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="'|'" />
</ScalarOperator>
</UserDefinedFunction>
</ScalarOperator>
</DefinedValue>
<DefinedValue>
<ColumnReference Column="Expr1008" />
<ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([tempdb].[dbo].[#Temp].[TextData],(6),'|')">
<UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[tempdb]" Schema="[dbo]" Table="[#Temp]" Column="TextData" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(6)" />
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="'|'" />
</ScalarOperator>
</UserDefinedFunction>
</ScalarOperator>
</DefinedValue>
<DefinedValue>
<ColumnReference Column="Expr1009" />
<ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([tempdb].[dbo].[#Temp].[TextData],(7),'|')">
<UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[tempdb]" Schema="[dbo]" Table="[#Temp]" Column="TextData" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(7)" />
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="'|'" />
</ScalarOperator>
</UserDefinedFunction>
</ScalarOperator>
</DefinedValue>
</DefinedValues>
<RelOp AvgRowSize="4011" EstimateCPU="0.0193131" EstimateIO="0.145426" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="17486" LogicalOp="Table Scan" NodeId="1" Parallel="false" PhysicalOp="Table Scan" EstimatedTotalSubtreeCost="0.164739" TableCardinality="17486">
<OutputList>
<ColumnReference Database="[tempdb]" Schema="[dbo]" Table="[#Temp]" Column="TextData" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="17486" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>
<TableScan Ordered="false" ForcedIndex="false" ForceScan="false" NoExpandHint="false" Storage="RowStore">
<DefinedValues>
<DefinedValue>
<ColumnReference Database="[tempdb]" Schema="[dbo]" Table="[#Temp]" Column="TextData" />
</DefinedValue>
</DefinedValues>
<Object Database="[tempdb]" Schema="[dbo]" Table="[#Temp]" IndexKind="Heap" Storage="RowStore" />
</TableScan>
</RelOp>
</ComputeScalar>
</RelOp>
</QueryPlan>
</StmtSimple>
</Statements>
</Batch>
</BatchSequence>
</ShowPlanXML>
这是视图的计划。
<?xml version="1.0" encoding="utf-16"?>
<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.2" Build="12.0.4100.1" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">
<BatchSequence>
<Batch>
<Statements>
<StmtSimple StatementCompId="1" StatementEstRows="17486" StatementId="1" StatementOptmLevel="FULL" StatementOptmEarlyAbortReason="GoodEnoughPlanFound" CardinalityEstimationModelVersion="120" StatementSubTreeCost="0.905265" StatementText="Select --TextData,
 dbo.udf_StringSplit(TextData, 1, '|'), dbo.udf_StringSplit(TextData, 2, '|'), dbo.udf_StringSplit(TextData, 3, '|'),
 dbo.udf_StringSplit(TextData, 4, '|'), dbo.udf_StringSplit(TextData, 5, '|'), dbo.udf_StringSplit(TextData, 6, '|'),
 dbo.udf_StringSplit(TextData, 7, '|')--, TextData
 From vAddresses" StatementType="SELECT" QueryHash="0xB4F8A0B288802C4E" QueryPlanHash="0x28DA02D774B1AF53" RetrievedFromCache="true">
<StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />
<QueryPlan DegreeOfParallelism="0" NonParallelPlanReason="CouldNotGenerateValidParallelPlan" CachedPlanSize="32" CompileTime="3" CompileCPU="3" CompileMemory="520">
<Warnings>
<PlanAffectingConvert ConvertIssue="Cardinality Estimate" Expression="CONVERT(varchar(8000),[DmProd01].[dbo].[HumanaInputFiles].[file_stream],0)" />
</Warnings>
<MemoryGrantInfo SerialRequiredMemory="0" SerialDesiredMemory="0" />
<OptimizerHardwareDependentProperties EstimatedAvailableMemoryGrant="838735" EstimatedPagesCached="419367" EstimatedAvailableDegreeOfParallelism="4" />
<RelOp AvgRowSize="28023" EstimateCPU="0.0017486" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="17486" LogicalOp="Compute Scalar" NodeId="0" Parallel="false" PhysicalOp="Compute Scalar" EstimatedTotalSubtreeCost="0.905265">
<OutputList>
<ColumnReference Column="Expr1004" />
<ColumnReference Column="Expr1005" />
<ColumnReference Column="Expr1006" />
<ColumnReference Column="Expr1007" />
<ColumnReference Column="Expr1008" />
<ColumnReference Column="Expr1009" />
<ColumnReference Column="Expr1010" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="17486" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>
<ComputeScalar>
<DefinedValues>
<DefinedValue>
<ColumnReference Column="Expr1004" />
<ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([Expr1011],(1),'|')">
<UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
<ScalarOperator>
<Identifier>
<ColumnReference Column="Expr1011" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(1)" />
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="'|'" />
</ScalarOperator>
</UserDefinedFunction>
</ScalarOperator>
</DefinedValue>
<DefinedValue>
<ColumnReference Column="Expr1005" />
<ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([Expr1011],(2),'|')">
<UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
<ScalarOperator>
<Identifier>
<ColumnReference Column="Expr1011" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(2)" />
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="'|'" />
</ScalarOperator>
</UserDefinedFunction>
</ScalarOperator>
</DefinedValue>
<DefinedValue>
<ColumnReference Column="Expr1006" />
<ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([Expr1011],(3),'|')">
<UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
<ScalarOperator>
<Identifier>
<ColumnReference Column="Expr1011" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(3)" />
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="'|'" />
</ScalarOperator>
</UserDefinedFunction>
</ScalarOperator>
</DefinedValue>
<DefinedValue>
<ColumnReference Column="Expr1007" />
<ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([Expr1011],(4),'|')">
<UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
<ScalarOperator>
<Identifier>
<ColumnReference Column="Expr1011" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(4)" />
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="'|'" />
</ScalarOperator>
</UserDefinedFunction>
</ScalarOperator>
</DefinedValue>
<DefinedValue>
<ColumnReference Column="Expr1008" />
<ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([Expr1011],(5),'|')">
<UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
<ScalarOperator>
<Identifier>
<ColumnReference Column="Expr1011" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(5)" />
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="'|'" />
</ScalarOperator>
</UserDefinedFunction>
</ScalarOperator>
</DefinedValue>
<DefinedValue>
<ColumnReference Column="Expr1009" />
<ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([Expr1011],(6),'|')">
<UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
<ScalarOperator>
<Identifier>
<ColumnReference Column="Expr1011" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(6)" />
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="'|'" />
</ScalarOperator>
</UserDefinedFunction>
</ScalarOperator>
</DefinedValue>
<DefinedValue>
<ColumnReference Column="Expr1010" />
<ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([Expr1011],(7),'|')">
<UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
<ScalarOperator>
<Identifier>
<ColumnReference Column="Expr1011" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(7)" />
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="'|'" />
</ScalarOperator>
</UserDefinedFunction>
</ScalarOperator>
</DefinedValue>
</DefinedValues>
<RelOp AvgRowSize="4019" EstimateCPU="0.0034972" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="17486" LogicalOp="Compute Scalar" NodeId="1" Parallel="false" PhysicalOp="Compute Scalar" EstimatedTotalSubtreeCost="0.88673">
<OutputList>
<ColumnReference Column="Expr1011" />
</OutputList>
<ComputeScalar>
<DefinedValues>
<DefinedValue>
<ColumnReference Column="Expr1011" />
<ScalarOperator ScalarString="CONVERT(varchar(8000),[DmProd01].[dbo].[HumanaInputFiles].[file_stream],0)">
<Convert DataType="varchar" Length="8000" Style="0" Implicit="false">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[DmProd01]" Schema="[dbo]" Table="[HumanaInputFiles]" Column="file_stream" />
</Identifier>
</ScalarOperator>
</Convert>
</ScalarOperator>
</DefinedValue>
</DefinedValues>
<RelOp AvgRowSize="4043" EstimateCPU="0.0386262" EstimateIO="0.844606" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="17486" LogicalOp="Table Scan" NodeId="2" Parallel="false" PhysicalOp="Table Scan" EstimatedTotalSubtreeCost="0.883233" TableCardinality="34972">
<OutputList>
<ColumnReference Database="[DmProd01]" Schema="[dbo]" Table="[HumanaInputFiles]" Column="file_stream" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="17486" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>
<TableScan Ordered="false" ForcedIndex="false" ForceScan="false" NoExpandHint="false" Storage="RowStore">
<DefinedValues>
<DefinedValue>
<ColumnReference Database="[DmProd01]" Schema="[dbo]" Table="[HumanaInputFiles]" Column="file_stream" />
</DefinedValue>
</DefinedValues>
<Object Database="[DmProd01]" Schema="[dbo]" Table="[HumanaInputFiles]" IndexKind="Heap" Storage="RowStore" />
<Predicate>
<ScalarOperator ScalarString="[DmProd01].[dbo].[HumanaInputFiles].[file_type]=N'adr'">
<Compare CompareOp="EQ">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[DmProd01]" Schema="[dbo]" Table="[HumanaInputFiles]" Column="file_type" ComputedColumn="true" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="N'adr'" />
</ScalarOperator>
</Compare>
</ScalarOperator>
</Predicate>
</TableScan>
</RelOp>
</ComputeScalar>
</RelOp>
</ComputeScalar>
</RelOp>
</QueryPlan>
</StmtSimple>
</Statements>
</Batch>
</BatchSequence>
</ShowPlanXML>
编辑:对此进行了不同的搜索,发现答案是使用 top 和 order by。这使它缩短到 4 秒。似乎有点做作,仍然没有解释查看查询计划如何帮助解决这个问题,所以我自己不会回答这个问题,而是让它保持打开状态。