sql - Handling inheritance with overriding efficiently

Question

I have the following two data structures.

First, a list of properties applied to object triples:

Object1  Object2  Object3 Property  Value
     O1       O2       O3       P1  "abc"
     O1       O2       O3       P2  "xyz"
     O1       O3       O4       P1  "123"
     O2       O4       O5       P1  "098"

Second, an inheritance tree:

Or viewed as a relation:

Object    Parent
    O2        O1
    O4        O2
    O3        O1
    O5        O3
    O1      null

The semantics of this being that O2 inherits properties from O1; O4 - from O2 and O1; O3 - from O1; and O5 - from O3 and O1, in that order of precedence.
NOTE 1: I have an efficient way to select all children or all parents of a given object. This is currently implemented with left and right indexes, but hierarchyid could also work. This does not seem important right now.
NOTE 2: I have tiggers in place that make sure that the "Object" column always contains all possible objects, even when they do not really have to be there (i.e. have no parent or children defined). This makes it possible to use inner joins rather than severely less effiecient outer joins.

The objective is: Given a pair of (Property, Value), return all object triples that have that property with that value either defined explicitly or inherited from a parent.

NOTE 1: An object triple (X,Y,Z) is considered a "parent" of triple (A,B,C) when it is true that either X = A or X is a parent of A, and the same is true for (Y,B) and (Z,C).
NOTE 2: A property defined on a closer parent "overrides" the same property defined on a more distant parent.
NOTE 3: When (A,B,C) has two parents - (X1,Y1,Z1) and (X2,Y2,Z2), then (X1,Y1,Z1) is considered a "closer" parent when:
(a) X2 is a parent of X1, or
(b) X2 = X1 and Y2 is a parent of Y1, or
(c) X2 = X1 and Y2 = Y1 and Z2 is a parent of Z1

In other words, the "closeness" in ancestry for triples is defined based on the first components of the triples first, then on the second components, then on the third components. This rule establishes an unambigous partial order for triples in terms of ancestry.

For example, given the pair of (P1, "abc"), the result set of triples will be:

 O1, O2, O3     -- Defined explicitly
 O1, O2, O5     -- Because O5 inherits from O3
 O1, O4, O3     -- Because O4 inherits from O2
 O1, O4, O5     -- Because O4 inherits from O2 and O5 inherits from O3
 O2, O2, O3     -- Because O2 inherits from O1
 O2, O2, O5     -- Because O2 inherits from O1 and O5 inherits from O3
 O2, O4, O3     -- Because O2 inherits from O1 and O4 inherits from O2
 O3, O2, O3     -- Because O3 inherits from O1
 O3, O2, O5     -- Because O3 inherits from O1 and O5 inherits from O3
 O3, O4, O3     -- Because O3 inherits from O1 and O4 inherits from O2
 O3, O4, O5     -- Because O3 inherits from O1 and O4 inherits from O2 and O5 inherits from O3
 O4, O2, O3     -- Because O4 inherits from O1
 O4, O2, O5     -- Because O4 inherits from O1 and O5 inherits from O3
 O4, O4, O3     -- Because O4 inherits from O1 and O4 inherits from O2
 O5, O2, O3     -- Because O5 inherits from O1
 O5, O2, O5     -- Because O5 inherits from O1 and O5 inherits from O3
 O5, O4, O3     -- Because O5 inherits from O1 and O4 inherits from O2
 O5, O4, O5     -- Because O5 inherits from O1 and O4 inherits from O2 and O5 inherits from O3

Note that the triple (O2, O4, O5) is absent from this list. This is because property P1 is defined explicitly for the triple (O2, O4, O5) and this prevents that triple from inheriting that property from (O1, O2, O3). Also note that the triple (O4, O4, O5) is also absent. This is because that triple inherits its value of P1="098" from (O2, O4, O5), because it is a closer parent than (O1, O2, O3).

The straightforward way to do it is the following. First, for every triple that a property is defined on, select all possible child triples:

select Children1.Id as O1, Children2.Id as O2, Children3.Id as O3, tp.Property, tp.Value
from TriplesAndProperties tp

-- Select corresponding objects of the triple
inner join Objects as Objects1 on Objects1.Id = tp.O1
inner join Objects as Objects2 on Objects2.Id = tp.O2
inner join Objects as Objects3 on Objects3.Id = tp.O3

-- Then add all possible children of all those objects
inner join Objects as Children1 on Objects1.Id [isparentof] Children1.Id
inner join Objects as Children2 on Objects2.Id [isparentof] Children2.Id
inner join Objects as Children3 on Objects3.Id [isparentof] Children3.Id

But this is not the whole story: if some triple inherits the same property from several parents, this query will yield conflicting results. Therefore, second step is to select just one of those conflicting results:

select * from
(
    select 
        Children1.Id as O1, Children2.Id as O2, Children3.Id as O3, tp.Property, tp.Value,
        row_number() over( 
            partition by Children1.Id, Children2.Id, Children3.Id, tp.Property
            order by Objects1.[depthInTheTree] descending, Objects2.[depthInTheTree] descending, Objects3.[depthInTheTree] descending
        )
        as InheritancePriority
    from
    ... (see above)
)
where InheritancePriority = 1

The window function row_number() over( ... ) does the following: for every unique combination of objects triple and property, it sorts all values by the ancestral distance from the triple to the parents that the value is inherited from, and then I only select the very first of the resulting list of values. A similar effect can be achieved with a GROUP BY and ORDER BY statements, but I just find the window function semantically cleaner (the execution plans they yield are identical). The point is, I need to select the closest of contributing ancestors, and for that I need to group and then sort within the group.

And finally, now I can simply filter the result set by Property and Value.

This scheme works. Very reliably and predictably. It has proven to be very powerful for the business task it implements.

The only trouble is, it is awfuly slow.
One might point out the join of seven tables might be slowing things down, but that is actually not the bottleneck.

According to the actual execution plan I'm getting from the SQL Management Studio (as well as SQL Profiler), the bottleneck is the sorting. The problem is, in order to satisfy my window function, the server has to sort by Children1.Id, Children2.Id, Children3.Id, tp.Property, Parents1.[depthInTheTree] descending, Parents2.[depthInTheTree] descending, Parents3.[depthInTheTree] descending, and there can be no indexes it can use, because the values come from a cross join of several tables.

EDIT: Per Michael Buen's suggestion (thank you, Michael), I have posted the whole puzzle to sqlfiddle here. One can see in the execution plan that the Sort operation accounts for 32% of the whole query, and that is going to grow with the number of total rows, because all the other operations use indexes.

Usually in such cases I would use an indexed view, but not in this case, because indexed views cannot contain self-joins, of which there are six.

The only way that I can think of so far is to create six copies of the Objects table and then use them for the joins, thus enabling an indexed view.
Did the time come that I shall be reduced to that kind of hacks? The despair sets in.

score 2 · Accepted Answer

我有 3 个可能的答案。

您的问题的 sql 小提琴在这里：http ://sqlfiddle.com/#!3/7c7a0/3/0

我的答案的 sql 小提琴在这里：http ://sqlfiddle.com/#!3/5d257/1

警告：

查询分析器是不够的- 我注意到许多答案被拒绝，因为他们的查询计划比原始查询更昂贵。分析仪只是指南。根据实际的数据集、硬件和用例，成本更高的查询可以比成本更低的查询更快地返回结果。您必须在您的环境中进行测试。
查询分析器是无效的——即使您找到一种方法从查询中删除“最昂贵的步骤”，它通常对您的查询没有任何影响。
单独的查询更改很少能缓解架构/设计问题- 一些答案被拒绝，因为它们涉及架构级别的更改，例如触发器和附加表。拒绝优化的复杂查询强烈表明问题出在底层设计或我的期望上。您可能不喜欢它，但您可能不得不接受该问题在查询级别无法解决。
索引视图不能包含 row_number()/partitition 子句- 通过创建对象表的六个副本来解决自连接问题不足以让您创建建议的索引视图。我在这个 sqlfiddle中尝试过。如果您取消注释最后一个“创建索引”语句，您将收到错误消息，因为您的视图“包含排名或聚合窗口函数”。

工作答案：

左连接而不是 row_number() - 您可以使用使用左连接的查询来排除在树中被覆盖的结果。从这个查询中删除最后的“order by”实际上删除了一直困扰你的排序！此查询的执行计划仍然比您原来的更昂贵，但请参阅上面的免责声明 #1。
部分查询的索引视图- 使用一些严肃的查询魔法（基于此技术），我为部分查询创建了一个索引视图。此视图可用于增强原始问题查询或答案#1。
实现为一个索引良好的表- 其他人提出了这个答案，但他们可能没有很好地解释它。除非您的结果集非常大或您对源表进行非常频繁的更新，否则实现查询结果并使用触发器使它们保持最新是解决此类问题的完美方法。为查询创建视图后，测试此选项就很容易了。您可以重复使用答案 #2 来加快触发速度，然后随着时间的推移进一步改进它。（你说的是为你的表创建六个副本，先试试这个。它保证你关心的选择的性能会尽可能好。）

这是我从 sqlfiddle 得到的答案的架构部分：

Create Table Objects
(
    Id int not null identity primary key,
    LeftIndex int not null default 0,
    RightIndex int not null default 0
)

alter table Objects add ParentId int null references Objects

CREATE TABLE TP
(
    Object1 int not null references Objects,
    Object2 int not null references Objects,
    Object3 int not null references Objects,
    Property varchar(20) not null,
    Value varchar(50) not null
)


insert into Objects(LeftIndex, RightIndex) values(1, 10)
insert into Objects(ParentId, LeftIndex, RightIndex) values(1, 2, 5)
insert into Objects(ParentId, LeftIndex, RightIndex) values(1, 6, 9)
insert into Objects(ParentId, LeftIndex, RightIndex) values(2, 3, 4)
insert into Objects(ParentId, LeftIndex, RightIndex) values(3, 7, 8)

insert into TP(Object1, Object2, Object3, Property, Value) values(1,2,3, 'P1', 'abc')
insert into TP(Object1, Object2, Object3, Property, Value) values(1,2,3, 'P2', 'xyz')
insert into TP(Object1, Object2, Object3, Property, Value) values(1,3,4, 'P1', '123')
insert into TP(Object1, Object2, Object3, Property, Value) values(2,4,5, 'P1', '098')

create index ix_LeftIndex on Objects(LeftIndex)
create index ix_RightIndex on Objects(RightIndex)
create index ix_Objects on TP(Property, Value, Object1, Object2, Object3)
create index ix_Prop on TP(Property)
GO

---------- QUESTION ADDITIONAL SCHEMA --------
CREATE VIEW TPResultView AS
Select O1, O2, O3, Property, Value
FROM
(
    select Children1.Id as O1, Children2.Id as O2, Children3.Id as O3, tp.Property, tp.Value,

    row_number() over( 
        partition by Children1.Id, Children2.Id, Children3.Id, tp.Property
        order by Objects1.LeftIndex desc, Objects2.LeftIndex desc, Objects3.LeftIndex desc
    )
    as Idx

    from tp

    -- Select corresponding objects of the triple
    inner join Objects as Objects1 on Objects1.Id = tp.Object1
    inner join Objects as Objects2 on Objects2.Id = tp.Object2
    inner join Objects as Objects3 on Objects3.Id = tp.Object3

    -- Then add all possible children of all those objects
    inner join Objects as Children1 on Children1.LeftIndex between Objects1.LeftIndex and Objects1.RightIndex
    inner join Objects as Children2 on Children2.LeftIndex between Objects2.LeftIndex and Objects2.RightIndex
    inner join Objects as Children3 on Children3.LeftIndex between Objects3.LeftIndex and Objects3.RightIndex
) as x
WHERE idx = 1 
GO

---------- ANSWER 1 SCHEMA --------

CREATE VIEW TPIntermediate AS
select tp.Property, tp.Value 
    , Children1.Id as O1, Children2.Id as O2, Children3.Id as O3
    , Objects1.LeftIndex as PL1, Objects2.LeftIndex as PL2, Objects3.LeftIndex as PL3    
    , Children1.LeftIndex as CL1, Children2.LeftIndex as CL2, Children3.LeftIndex as CL3    
    from tp

    -- Select corresponding objects of the triple
    inner join Objects as Objects1 on Objects1.Id = tp.Object1
    inner join Objects as Objects2 on Objects2.Id = tp.Object2
    inner join Objects as Objects3 on Objects3.Id = tp.Object3

    -- Then add all possible children of all those objects
    inner join Objects as Children1 WITH (INDEX(ix_LeftIndex)) on Children1.LeftIndex between Objects1.LeftIndex and Objects1.RightIndex
    inner join Objects as Children2 WITH (INDEX(ix_LeftIndex)) on Children2.LeftIndex between Objects2.LeftIndex and Objects2.RightIndex
    inner join Objects as Children3 WITH (INDEX(ix_LeftIndex)) on Children3.LeftIndex between Objects3.LeftIndex and Objects3.RightIndex
GO

---------- ANSWER 2 SCHEMA --------

-- Partial calculation using an indexed view
-- Circumvented the self-join limitation using a black magic technique, based on 
-- http://jmkehayias.blogspot.com/2008/12/creating-indexed-view-with-self-join.html
CREATE TABLE dbo.multiplier (i INT PRIMARY KEY)

INSERT INTO dbo.multiplier VALUES (1) 
INSERT INTO dbo.multiplier VALUES (2) 
INSERT INTO dbo.multiplier VALUES (3) 
GO

CREATE VIEW TPIndexed
WITH SCHEMABINDING
AS

SELECT tp.Object1, tp.object2, tp.object3, tp.property, tp.value,
    SUM(ISNULL(CASE M.i WHEN 1 THEN Objects.LeftIndex ELSE NULL END, 0)) as PL1,
    SUM(ISNULL(CASE M.i WHEN 2 THEN Objects.LeftIndex ELSE NULL END, 0)) as PL2,
    SUM(ISNULL(CASE M.i WHEN 3 THEN Objects.LeftIndex ELSE NULL END, 0)) as PL3,
    SUM(ISNULL(CASE M.i WHEN 1 THEN Objects.RightIndex ELSE NULL END, 0)) as PR1,
    SUM(ISNULL(CASE M.i WHEN 2 THEN Objects.RightIndex ELSE NULL END, 0)) as PR2,
    SUM(ISNULL(CASE M.i WHEN 3 THEN Objects.RightIndex ELSE NULL END, 0)) as PR3,
    COUNT_BIG(*) as ID
    FROM dbo.tp
    cross join dbo.multiplier M 
    inner join dbo.Objects 
    on (M.i = 1 AND Objects.Id = tp.Object1)
    or (M.i = 2 AND Objects.Id = tp.Object2)
    or (M.i = 3 AND Objects.Id = tp.Object3)
GROUP BY tp.Object1, tp.object2, tp.object3, tp.property, tp.value
GO

-- This index is mostly useless but required
create UNIQUE CLUSTERED index pk_TPIndexed on dbo.TPIndexed(property, value, object1, object2, object3)
-- Once we have the clustered index, we can create a nonclustered that actually addresses our needs
create NONCLUSTERED index ix_TPIndexed on dbo.TPIndexed(property, value, PL1, PL2, PL3, PR1, PR2, PR3)
GO

-- NOTE: this View is not indexed, but is uses the indexed view 
CREATE VIEW TPIndexedResultView AS
Select O1, O2, O3, Property, Value
FROM
(
    select Children1.Id as O1, Children2.Id as O2, Children3.Id as O3, tp.Property, tp.Value,

    row_number() over( 
        partition by tp.Property, Children1.Id, Children2.Id, Children3.Id
        order by tp.Property, Tp.PL1 desc, Tp.PL2 desc, Tp.PL3 desc
    )
    as Idx

    from TPIndexed as TP WITH (NOEXPAND)

    -- Then add all possible children of all those objects
    inner join Objects as Children1 WITH (INDEX(ix_LeftIndex)) on Children1.LeftIndex between TP.PL1 and TP.PR1
    inner join Objects as Children2 WITH (INDEX(ix_LeftIndex)) on Children2.LeftIndex between TP.PL2 and TP.PR2
    inner join Objects as Children3 WITH (INDEX(ix_LeftIndex)) on Children3.LeftIndex between TP.PL3 and TP.PR3
) as x
WHERE idx = 1 
GO


-- NOTE: this View is not indexed, but is uses the indexed view 
CREATE VIEW TPIndexedIntermediate AS
select tp.Property, tp.Value 
    , Children1.Id as O1, Children2.Id as O2, Children3.Id as O3
    , PL1, PL2, PL3    
    , Children1.LeftIndex as CL1, Children2.LeftIndex as CL2, Children3.LeftIndex as CL3    
    from TPIndexed as TP WITH (NOEXPAND)

    -- Then add all possible children of all those objects
    inner join Objects as Children1 WITH (INDEX(ix_LeftIndex)) on Children1.LeftIndex between TP.PL1 and TP.PR1
    inner join Objects as Children2 WITH (INDEX(ix_LeftIndex)) on Children2.LeftIndex between TP.PL2 and TP.PR2
    inner join Objects as Children3 WITH (INDEX(ix_LeftIndex)) on Children3.LeftIndex between TP.PL3 and TP.PR3  
GO


---------- ANSWER 3 SCHEMA --------
-- You're talking about making six copies of the TP table
-- If you're going to go that far, you might as well, go the trigger route
-- The performance profile is much the same - slower on insert, faster on read
-- And instead of still recalculating on every read, you'll be recalculating
-- only when the data changes. 

CREATE TABLE TPResult
(
    Object1 int not null references Objects,
    Object2 int not null references Objects,
    Object3 int not null references Objects,
    Property varchar(20) not null,
    Value varchar(50) not null
)
GO

create UNIQUE index ix_Result on TPResult(Property, Value, Object1, Object2, Object3)


--You'll have to imagine this trigger, sql fiddle doesn't want to do it
--CREATE TRIGGER tr_TP
--ON TP
--  FOR INSERT, UPDATE, DELETE
--AS
--  DELETE FROM TPResult
-- -- For this example we'll just insert into the table once
INSERT INTO TPResult 
SELECT O1, O2, O3, Property, Value 
FROM TPResultView

从 sqlfiddle 查询我的部分答案：

-------- QUESTION QUERY ----------
-- Original query, modified to use the view I added
SELECT O1, O2, O3, Property, Value 
FROM TPResultView
WHERE property = 'P1' AND value = 'abc'
-- Your assertion is that this order by is the most expensive part. 
-- Sometimes converting queries into views allows the server to
-- Optimize them better over time.
-- NOTE: removing this order by has no effect on this query.
-- ORDER BY O1, O2, O3
GO

-------- ANSWER 1  QUERY ----------
-- A different way to get the same result. 
-- Query optimizer says this is more expensive, but I've seen cases where
-- it says a query is more expensive but it returns results faster.
SELECT O1, O2, O3, Property, Value
FROM (
  SELECT A.O1, A.O2, A.O3, A.Property, A.Value
  FROM TPIntermediate A
  LEFT JOIN TPIntermediate B ON A.O1 = B.O1
    AND A.O2 = B.O2
    AND A.O3 = B.O3
    AND A.Property = B.Property
    AND 
    (
      -- Find any rows with Parent LeftIndex triplet that is greater than this one
      (A.PL1 < B.PL1
      AND A.PL2 < B.PL2
      AND A.PL3 < B.PL3) 
    OR
      -- Find any rows with LeftIndex triplet that is greater than this one
      (A.CL1 < B.CL1
      AND A.CL2 < B.CL2
      AND A.CL3 < B.CL3)
    )
  -- If this row has any rows that match the previous two cases, exclude it
  WHERE B.O1 IS NULL ) AS x
WHERE property = 'P1' AND value = 'abc'
-- NOTE: Removing this order _DOES_ reduce query cost removing the "sort" action
-- that has been the focus of your question.   
-- Howeer, it wasn't clear from your question whether this order by was required.
--ORDER BY O1, O2, O3
GO

-------- ANSWER 2  QUERIES ----------
-- Same as above but using an indexed view to partially calculate results

SELECT O1, O2, O3, Property, Value 
FROM TPIndexedResultView
WHERE property = 'P1' AND value = 'abc'
-- Your assertion is that this order by is the most expensive part. 
-- Sometimes converting queries into views allows the server to
-- Optimize them better over time.
-- NOTE: removing this order by has no effect on this query.
--ORDER BY O1, O2, O3
GO

SELECT O1, O2, O3, Property, Value
FROM (
  SELECT A.O1, A.O2, A.O3, A.Property, A.Value
  FROM TPIndexedIntermediate A
  LEFT JOIN TPIndexedIntermediate B ON A.O1 = B.O1
    AND A.O2 = B.O2
    AND A.O3 = B.O3
    AND A.Property = B.Property
    AND 
    (
      -- Find any rows with Parent LeftIndex triplet that is greater than this one
      (A.PL1 < B.PL1
      AND A.PL2 < B.PL2
      AND A.PL3 < B.PL3) 
    OR
      -- Find any rows with LeftIndex triplet that is greater than this one
      (A.CL1 < B.CL1
      AND A.CL2 < B.CL2
      AND A.CL3 < B.CL3)
    )
  -- If this row has any rows that match the previous two cases, exclude it
  WHERE B.O1 IS NULL ) AS x
WHERE property = 'P1' AND value = 'abc'
-- NOTE: Removing this order _DOES_ reduce query cost removing the "sort" action
-- that has been the focus of your question.   
-- Howeer, it wasn't clear from your question whether this order by was required.
--ORDER BY O1, O2, O3
GO



-------- ANSWER 3  QUERY ----------
-- Returning results from a pre-calculated table is fast and easy
-- Unless your are doing many more inserts than reads, or your result
-- set is very large, this is a fine way to compensate for a poor design
-- in one area of your database.
SELECT Object1 as O1, Object2 as O2, Object3 as O3, Property, Value 
FROM TPResult
WHERE property = 'P1' AND value = 'abc'
ORDER BY O1, O2, O3

score 0 · Accepted Answer

您是否尝试过索引（或设置 pk），首先是“Value”列，第二个是“Property”列，第三个是“Object1”列，第四个是“Object2”列，第五个是“Object3”列？我假设“价值”比“财产”更具限制性。

我还假设您将 Id 列设置为主键，并且 ParentId 和 Id 之间存在外键关系。

此查询如何执行？：

    with 
    -- First, get all combinations that match the property/value pair.
    validTrip as (
        select Object1, Object2, Object3
        from TriplesAndProperties 
        where value = @value
            and property = @property
    ),
    -- Recursively flatten the inheritance hierarchy of Object1, 2 and 3.
    o1 as (
        select Id, 0 as InherLevel from Objects where Id in (select Object1 from validTrip)
        union all
        select rec.Id, InherLevel + 1 from Objects rec inner join o1 base on rec.Parent = base.[Object]
    ),
    o2 as (
        select Id, 0 as InherLevel from Objects where Id in (select Object2 from validTrip)
        union all
        select rec.Id, InherLevel + 1 from Objects rec inner join o2 base on rec.Parent = base.[Object]
    ),
    o3 as (
        select Id, 0 as InherLevel from Objects where Id in (select Object3 from validTrip)
        union all
        select rec.Id, InherLevel + 1 from Objects rec inner join o3 base on rec.Parent = base.[Object]
   )
    -- select the Id triple.
    select o1.Id, o2.Id, o3.Id N
    -- match every option in o1, with every option in o2, with every option in o3.
    from o1
        cross join o2
        cross join o3
    -- order by the inheritance level.
    order by o1.InherLevel, o2.InherLevel, o3.InherLevel;

score 0 · Accepted Answer

您可以通过在索引表中实现连接来加快速度，比如joinedresult。这具有需要空间和保存到磁盘的缺点。但它的优点是能够对慢速部分使用索引。

insert into joinedresult
select Children1.Id as O1, Children2.Id as O2, Children3.Id as O3, tp.Property, tp.Value,Objects1.[depthInTheTree] as O1D,Objects2.[depthInTheTree] as O2D,Objects3. depthInTheTree]  as O3D from  ... (see above)

确保joinedresult在[O1,O2,O3,Property,O1D,O2D,O3D]上有一个索引，并在运行前清除它。然后

select * from
(
    select 
    Children1.Id as O1, Children2.Id as O2, Children3.Id as O3, tp.Property, tp.Value,
    row_number() over( 
        partition by Children1.Id, Children2.Id, Children3.Id, tp.Property
        order by O1D descending, O2D descending, O3D descending
    )
    as InheritancePriority
    from joinedresult
)
where InheritancePriority = 1

score 0 · Accepted Answer

缓存是加快查询速度的关键。它减少了您必须进行的计算。您要创建索引，因为您要CACHE并保存WORK。以下是执行此操作的两种可能性。

选项1

SQL 数据库根据您的窗口函数进行排序。你说窗口功能太慢了。

我不知道这会有多好，但它可能会奏效。

您可以尝试按单列排序 - “紧密度”，而不是按多列排序。

现在让我们将接近度定义为一些抽象整数。您可以使用以下 SQL 代替窗口函数：

select * from
(
    select
        Children1.Id as O1, Children2.Id as O2, Children3.Id as O3, tp.Property, tp.Value,

        row_number() over( 
            partition by Children1.Id, Children2.Id, Children3.Id, tp.Property
            order by closeness DESC
        )
        as InheritancePriority
    from
    ... (see above)
)
where InheritancePriority = 1

closeness 可以是 TriplesAndProperties 表中定义的列。对于每个对象，您可以将其“接近度”定义为它与根节点 (O1) 的距离。然后我们可以定义closeness(tuple) = closeness(Object1)*100+closeness(Object2)*10+closeness(Object3)

这样，离根最远的元组就是你想要的。

为避免排序，您只需确保对接近度进行索引。

选项 2

我非常确定这会奏效。

定义您的 TriplesAndProperties 表以包含以下列：Object1, Object2, Object3, Property, Value, Effective_Object1, Effective_Object2, Effective_Object3, Closeness。

请注意，这里我还将紧密度定义为列。

当您将元组插入/更新到表中时，（X，Y，Z），您想要插入：

(X,Y,Z,Property,Value,X,Y,Z,0)
(X,Y,Z,Property,Value,X,Y,Z.child,1)
(X,Y,Z,Property,Value,X,Y,Z.grandchild,2)
(X,Y,Z,Property,Value,X,Y.child,Z,10)
(X,Y,Z,Property,Value,X,Y.child,Z.child,11)
(X,Y,Z,Property,Value,X,Y.child,Z.grandchild,12)
(X,Y,Z,Property,Value,X,Y.grandchild,Z,20)
(X,Y,Z,Property,Value,X,Y.grandchild,Z.child,21)
(X,Y,Z,Property,Value,X,Y.grandchild,Z.grandchild,22)
...
...

这意味着您不是在表中插入/更新/销毁单行，而是最多插入约 20 行。这还不错。

那么您的查询非常简单。

你只是说：

SELECT * FROM
    (
    SELECT Effective_Object1, Effective_Object2, Effective_Object3, Property, Value,
        row_number() over( 
            partition by Effective_Object1, Effective_Object2, Effective_Object3, Property
            order by Closeness DESC
        ) AS InheritancePriority FROM TriplesAndProperties
     ) WHERE InheritancePriority = 1;

在此选项中，您必须确保对紧密度进行索引，您可以仅按元组（Effective_Object1、Effective_Object2、Effective_Object3、Property、Closeness）进行索引。

在这两种情况下，您都有一定数量的缓存，即不添加任何额外信息的数据，但会缓存一定数量的计算或工作。

score 0 · Accepted Answer

我猜你的桌子很大。因此缓慢。在那种情况下，我还猜测您有多个属性（2 到多个）。在这种情况下，我建议您在 CTE 中移动“where property='P1'”。这将过滤大部分数据，使您的查询速度与属性数量一样快。

类似的东西：http ://sqlfiddle.com/#!3/7c7a0/92/0

score 0 · Accepted Answer

分层查询，即WITH RECURSIVE ...或专有等价物，例如CONNECT BY在这种情况下是您的朋友。

解决您的特定问题的方法是：从离开开始并上升到根，聚合并排除已经找到的任何内容。

sql - Handling inheritance with overriding efficiently

6 回答 6

选项1

选项 2

Related

Reference