要进行这种分析,您需要一个完整的 (T-)SQL 解析器,它可以访问数据库的当前结构(提示:列表达式可以变得任意复杂,包括子选择和一切)。
更新:我在下面的查询计划的想法不起作用。虽然该计划很好地跟踪列引用,但它没有以正确的顺序显示输出列。啊!我目前看不到如何治愈,属性中没有序数。所以,对不起,不起作用。我还是把它留给参考,也许其他人有基于此的想法。
想到的一件事是查看查询计划。您必须解析 XML 计划以查看某个列的值的来源。虽然这可能比直接查看 SQL 更简单,但它仍然不是微不足道的。查询计划也可能变得相当复杂。但是为了说明原理,让我们看一下 AdventureWorks DB 上的类似查询:
SELECT
rowguid,
Left(JobTitle,3),
'hardcoded string',
BirthDate,
Replace(BirthDate,'@xyz.com','')
FROM
HumanResources.Employee
此语句的 XML 查询计划如下所示:
<?xml version="1.0" encoding="utf-16"?>
<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.2" Build="11.0.3128.0" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">
<BatchSequence>
<Batch>
<Statements>
<StmtSimple StatementCompId="1" StatementEstRows="290" StatementId="1" StatementOptmLevel="TRIVIAL" StatementSubTreeCost="0.00807444" StatementText="Select rowguid, Left(JobTitle,3),'hardcoded string',BirthDate,Replace(BirthDate,'@xyz.com','') FROM HumanResources.Employee" StatementType="SELECT" QueryHash="0x5F035E11344539B" QueryPlanHash="0x717B3D06C26C61ED" RetrievedFromCache="false">
<StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />
<QueryPlan CachedPlanSize="16" CompileTime="14" CompileCPU="12" CompileMemory="152">
<MemoryGrantInfo SerialRequiredMemory="0" SerialDesiredMemory="0" />
<OptimizerHardwareDependentProperties EstimatedAvailableMemoryGrant="418321" EstimatedPagesCached="104580" EstimatedAvailableDegreeOfParallelism="2" />
<RelOp AvgRowSize="4045" EstimateCPU="2.9E-05" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="290" LogicalOp="Compute Scalar" NodeId="0" Parallel="false" PhysicalOp="Compute Scalar" EstimatedTotalSubtreeCost="0.00807444">
<OutputList>
<ColumnReference Database="[AdventureWorks2012]" Schema="[HumanResources]" Table="[Employee]" Column="BirthDate" />
<ColumnReference Database="[AdventureWorks2012]" Schema="[HumanResources]" Table="[Employee]" Column="rowguid" />
<ColumnReference Column="Expr1003" />
<ColumnReference Column="Expr1004" />
<ColumnReference Column="Expr1005" />
</OutputList>
<ComputeScalar>
<DefinedValues>
<DefinedValue>
<ColumnReference Column="Expr1003" />
<ScalarOperator ScalarString="substring([AdventureWorks2012].[HumanResources].[Employee].[JobTitle],(1),(3))">
<Intrinsic FunctionName="substring">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[AdventureWorks2012]" Schema="[HumanResources]" Table="[Employee]" Column="JobTitle" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(1)" />
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(3)" />
</ScalarOperator>
</Intrinsic>
</ScalarOperator>
</DefinedValue>
<DefinedValue>
<ColumnReference Column="Expr1004" />
<ScalarOperator ScalarString="'hardcoded string'">
<Const ConstValue="'hardcoded string'" />
</ScalarOperator>
</DefinedValue>
<DefinedValue>
<ColumnReference Column="Expr1005" />
<ScalarOperator ScalarString="replace(CONVERT_IMPLICIT(varchar(40),[AdventureWorks2012].[HumanResources].[Employee].[BirthDate],121),'@xyz.com','')">
<Intrinsic FunctionName="replace">
<ScalarOperator>
<Convert DataType="varchar" Length="40" Style="121" Implicit="true">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[AdventureWorks2012]" Schema="[HumanResources]" Table="[Employee]" Column="BirthDate" />
</Identifier>
</ScalarOperator>
</Convert>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="'@xyz.com'" />
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="''" />
</ScalarOperator>
</Intrinsic>
</ScalarOperator>
</DefinedValue>
</DefinedValues>
<RelOp AvgRowSize="80" EstimateCPU="0.000476" EstimateIO="0.00756944" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="290" LogicalOp="Clustered Index Scan" NodeId="1" Parallel="false" PhysicalOp="Clustered Index Scan" EstimatedTotalSubtreeCost="0.00804544" TableCardinality="290">
<OutputList>
<ColumnReference Database="[AdventureWorks2012]" Schema="[HumanResources]" Table="[Employee]" Column="JobTitle" />
<ColumnReference Database="[AdventureWorks2012]" Schema="[HumanResources]" Table="[Employee]" Column="BirthDate" />
<ColumnReference Database="[AdventureWorks2012]" Schema="[HumanResources]" Table="[Employee]" Column="rowguid" />
</OutputList>
<IndexScan Ordered="false" ForcedIndex="false" ForceScan="false" NoExpandHint="false">
<DefinedValues>
<DefinedValue>
<ColumnReference Database="[AdventureWorks2012]" Schema="[HumanResources]" Table="[Employee]" Column="JobTitle" />
</DefinedValue>
<DefinedValue>
<ColumnReference Database="[AdventureWorks2012]" Schema="[HumanResources]" Table="[Employee]" Column="BirthDate" />
</DefinedValue>
<DefinedValue>
<ColumnReference Database="[AdventureWorks2012]" Schema="[HumanResources]" Table="[Employee]" Column="rowguid" />
</DefinedValue>
</DefinedValues>
<Object Database="[AdventureWorks2012]" Schema="[HumanResources]" Table="[Employee]" Index="[PK_Employee_BusinessEntityID]" IndexKind="Clustered" />
</IndexScan>
</RelOp>
</ComputeScalar>
</RelOp>
</QueryPlan>
</StmtSimple>
</Statements>
</Batch>
</BatchSequence>
</ShowPlanXML>
在这里,SQL Server 已经完成了解析 SQL 并确定列值最终来自何处的所有艰苦工作(然后优化计划,但这在这里没有意义)。
第一个<OutputList>
对应于 SELECT 中的列列表。它有 5 列引用。从它们的属性可以看出,其中两个直接对应于表列,而其他三个只是引用了一些任意命名的表达式。现在您可以搜索这些表达式(如果需要,递归),它们又具有其他列引用(或者没有,如果“Expr1004”仅引用一个常量)。
因此,由于您只需要查看计划的几个元素,因此您很有可能找到源列(请注意,给定的输出列可能有多个源列)。
要获取 XML 查询计划,只需执行
SET SHOWPLAN_XML ON
<your statement>
SET SHOWPLAN_XML OFF
输出是计划(不执行语句 - 它只是估计的计划,但在这里并不重要)。
希望有帮助。