sql - 从 Xml 中选择所有具有类似图形的内容的路径

Question

我有一个包含如下元素的 XML 列：

<Root>
    <Word Type="pre1" Value="A" />
    <Word Type="pre1" Value="D" />

    <Word Type="base" Value="B" />

    <Word Type="post1" Value="C" />
    <Word Type="post1" Value="E" />
    <Word Type="post1" Value="F" />
</Root>

该模型类似于：

在此处输入图像描述

并希望在 MSSQL 中使用 XQuery 选择所有可能的路径以获得类似以下结果：

ABC ABE ABF DBC DBE DBF

或类似的东西：

<Root>
    <Word Type="pre1" Value="A" />
    <Word Type="pre1" Value="D" />

    <Word Type="pre2" Value="G" />
    <Word Type="pre2" Value="H" />

    <Word Type="base" Value="B" />

    <Word Type="post1" Value="C" />
    <Word Type="post1" Value="E" />
    <Word Type="post1" Value="F" />
</Root>

在此处输入图像描述

结果：

AHBC AHBE AHBF DHBC DHBE DHBF AGBC AGBE AGBF DGBC DGBE DGBF

score 6 · Accepted Answer

您可以使用 CTE 构建唯一类型列表，然后在递归 CTE 中使用它来构建字符串。最后，您挑选出上一次迭代中生成的字符串。

with Types as
(
  select row_number() over(order by T.N) as ID,
         T.N.value('.', 'varchar(10)') as Type
  from (select @XML.query('for $t in distinct-values(/Root/Word/@Type) 
                           return <T>{$t}</T>')
       ) as X(T)
    cross apply X.T.nodes('/T') as T(N)
),
Recu as
(
  select T.Type,
         T.ID,
         X.N.value('@Value', 'varchar(max)')  as Value
  from Types as T
    cross apply @XML.nodes('/Root/Word[@Type=sql:column("T.Type")]') as X(N)
  where T.ID = 1
  union all
  select T.Type,
         T.ID,
         R.Value+X.N.value('@Value', 'varchar(max)') as Value
  from Types as T
    inner join Recu as R
      on T.ID = R.ID + 1
    cross apply @XML.nodes('/Root/Word[@Type=sql:column("T.Type")]') as X(N)    
)
select R.Value
from Recu as R
where R.ID = (select max(T.ID) from Types as T)
order by R.Value

SQL小提琴

更新

这是一个性能更好的版本。它将 XML 分解为两个临时表。每种类型一个，所有单词一个。仍然需要递归 CTE，但它使用表而不是 XML。CTE 中的连接使用的每个临时表上还有一个索引。

-- Table to hold all values
create table #Values
(
  Type varchar(10),
  Value varchar(10)
);

-- Clustered index on Type is used in the CTE
create clustered index IX_#Values_Type on #Values(Type)

insert into #Values(Type, Value)
select T.N.value('@Type', 'varchar(10)'),
       T.N.value('@Value', 'varchar(10)')
from @XML.nodes('/Root/Word') as T(N);

-- Table that holds one row for each Type
create table #Types
(
  ID int identity,
  Type varchar(10),
  primary key (ID)
);

-- Add types by document order
-- Table-Valued Function Showplan Operator for nodes guarantees document order
insert into #Types(Type)
select T.Type
from (
     select row_number() over(order by T.N) as rn,
            T.N.value('@Type', 'varchar(10)') as Type
     from @XML.nodes('/Root/Word') as T(N)
     ) as T
group by T.Type
order by min(T.rn);

-- Last level of types
declare @MaxID int;
set @MaxID = (select max(ID) from #Types);

-- Recursive CTE that builds the strings
with C as 
(
  select T.ID,
         T.Type,
         cast(V.Value as varchar(max)) as Value
  from #Types as T
    inner join #Values as V
      on T.Type = V.Type
  where T.ID = 1
  union all
  select T.ID,
         T.Type,
         C.Value + V.Value
  from #Types as T
    inner join C
      on T.ID = C.ID + 1
    inner join #Values as V
      on T.Type = V.Type
)
select C.Value
from C
where C.ID = @MaxID
order by C.Value;

-- Cleanup
drop table #Types;
drop table #Values;

SQL小提琴

score 4 · Accepted Answer

你需要这三个元素集的叉积，所以基本上写一个无条件的join：

for $pre  in //Word[@Type="pre1"]
for $base in //Word[@Type="base"]
for $post in //Word[@Type="post1"]
return concat($pre/@Value, $base/@Value, $post/@Value)

对于扩展版本，我使用了两个辅助函数来获取所有属性，然后递归地连接结果。

似乎 MSSQL 不允许自定义 XQuery 函数。此代码对符合 XQuery 1.0（和更新的）处理器有效。

declare function local:call($prefix as xs:string) as xs:string* {
  local:recursion('', 
    for $value in distinct-values(//Word/@Type[starts-with(., $prefix)])
    order by $value
    return $value
  )
};

declare function local:recursion($strings as xs:string*, $attributes as xs:string*) as xs:string* {
  if (empty($attributes))
  then $strings
  else
    for $string in $strings
    for $append in //Word[@Type=$attributes[1]]
    return local:recursion(concat($string, $append/@Value), $attributes[position() != 1])
};

for $pre in local:call('pre')
for $base in local:call('base')
for $post in local:call('post')
return concat($pre, $base, $post)

score 4 · Accepted Answer

如果我正确理解您的 XML，那么您的所有图表本质上都是步骤序列，其中没有任何步骤可以省略，并且每个步骤都可能有多种选择。（因此，通过该图的一组路径本质上是各种备选方案的笛卡尔积。）如果这不是真的，那么接下来的就不是你想要的了。

在这里获得笛卡尔积的最简单方法是使用 XQuery FLWOR 表达式，其中for笛卡尔积中的每个因子都有一个子句，如 Jens Erat 的初始答案所示。

如果您事先不知道会有多少因子（因为您不知道图中可能出现的“类型”值序列），并且不想每次都重新制定查询，那么最简单的做法是编写一个递归函数，它将一系列“类型”值作为一个参数，将您正在处理的“根”元素作为另一个参数，并一次处理一个因素。

对于您的示例输入，此功能完成了这项工作：

declare function local:cartesian-product(
  $doc as element(),
  $types as xs:string*
) as xs:string* {

  (: If we have no $types left, we are done.
     Return the empty string. :)
  if (empty($types)) then 
     ''

  (: Otherwise, take the first value off the 
     sequence of types and return the Cartesian
     product of all Words with that type and
     the Cartesian product of all the remaining
     types. :)
  else
     let $t := $types[1],
         $rest := $types[position() > 1]
     for $val in $doc/Word[@Type = $t]/@Value
     for $suffix in 
         local:cartesian-product($doc,$rest)
     return concat($val, $suffix)
  };

唯一剩下的问题是按文档顺序获取不同“类型”值的序列有点棘手。我们可以调用distinct-values($doc//Word/@Type)来获取值，但不能保证它们会按文档顺序排列。

借用Dimitre Novatchev 对相关问题的解决方案，我们可以计算出适当的“类型”值序列，如下所示：

let $doc := <Root>
    <Word Type="pre1" Value="A" />
    <Word Type="pre1" Value="D" />

    <Word Type="pre2" Value="G" />
    <Word Type="pre2" Value="H" />

    <Word Type="base" Value="B" />

    <Word Type="post1" Value="C" />
    <Word Type="post1" Value="E" />
    <Word Type="post1" Value="F" />
</Root>

let $types0 := ($doc/Word/@Type),
    $types  := $types0[index-of($types0,.)[1]]

这将按文档顺序返回不同的值。

现在我们准备计算您想要的结果：

return local:cartesian-product($doc, $types)

结果的返回顺序与您给出的顺序略有不同；我假设您不关心结果的顺序：

AGBC AGBE AGBF AHBC AHBE AHBF DGBC DGBE DGBF DHBC DHBE DHBF

sql - 从 Xml 中选择所有具有类似图形的内容的路径

3 回答 3

Related

Reference