1

I was going through this post for record level versioning of tables. I noticed that the architecture deals with the usage of history tables. However, my scenario does not require rollback but retrieving point in time records. This is where I have tried with a design on using a single table for versioning. Note that this is a bare bone table data (no constraints, indices, etc.). I intend to index based on id since this involves group by clause on the column.

For example, I have got a table Test where

id is the identifier,

modstamp is the timestamp of the data (never null)

In addition to the columns above, the table will contain bookkeeping columns

local_modstamp is the timestamp at which the record was updated

del_modstamp is the timestamp at which the record was deleted

During backup, all the records are obtained from the source and inserted where the records would have the values local_modstamp = null and del_stamp = null.

id |modstamp                   |local_modstamp |del_modstamp |
---|---------------------------|---------------|-------------|
1  |2016-08-01 15:35:32 +00:00 |               |             |
2  |2016-07-29 13:39:45 +00:00 |               |             |
3  |2016-07-21 10:15:09 +00:00 |               |             |

Once the records are obtained, these are the scenarios for handling the data (assuming the reference time [ref_time] is the time at which the process is run):

  1. Insert as normal.

  2. Update: Update the most recent record with local_modstamp = ref_time. Then insert the new record. The query would be: update test set local_modstamp = where id = and local_modstamp is not null and del_modstamp is not null insert into test values(...)

  3. Delete: Update the most recent record with del_modstamp = ref_time. update test set del_modstamp = where id = and local_modstamp is not null and del_modstamp is not null

The design aims at getting the latest records where local_modstamp is not null and del_modstamp is not null. However, I ran into an issue where I intend to retrieve point in time using the query (inner-most query):

select id, max(modstamp) from test where modstamp <= <ref_time> and (del_modstamp is null || del_modstamp <= <ref_time>) group by id;

It seems that I have made a mistake (have I?) of using null as a placeholder to identify the latest records of the table. Is there a way to use the existing design to obtain the point in time records?

If not, I guess the probable solution is to set the local_modstamp to the latest records. This would require to update the logic using max(local_modstamp) in case of updates. Can I persist on my existing architecture to achieve in retrieving the point in time data?

I am using SQL-Server right now but this design may be extended to other database products too. I intend to use a more general approach to retrieve the data instead of using vendor specific hacks.

4

1 回答 1

3

引入版本范式。考虑这张表:

create table Entities(
    ID     int identity primary key,
    S1     [type],  -- Static data
    Sn     [type],  -- more static data
    V1     [type],  -- Volatile data
    Vn     [type]   -- more volatile data
);

静态数据是在实体生命周期内不会更改或不需要跟踪的数据。易失性数据更改,并且必须跟踪这些更改。

将 volatile 属性移动到单独的表中:

create table EntityVersions(
    ID        int  not null,
    Effective date not null default sysdate(),
    Deleted   bit  not null default 0,
    V1        [type],
    Vn        [type],
    constraint PK_EntityVersions primary key( ID, Effective ),
    constraint FK_EntityVersionEntity foreign key( ID )
        references Entities( ID )
);

实体表不再包含易失属性。

插入操作使用静态数据创建主实体记录,生成唯一 ID 值。该值用于插入具有易失性数据初始值的第一个版本。更新通常对主表没有任何作用(除非实际更改了静态值),并且将新易失性数据的新版本写入版本表。请注意,不会对现有版本进行任何更改,尤其是最新或“当前”版本。新版本被插入,操作结束。

要“撤消”最新版本或实际的任何版本,只需从版本表中删除该版本即可。

例如,具有以下属性的员工表:

EmployeeNum, HireDate, FirstName, LastName, PayRate, Dept, PhoneExt

当然,EmployeeNum 与 HireDate 和 FirstName 一样是静态的。PhoneExt 可能会不时更改,但我们不在乎。所以它被指定为静态的。最终的设计是:

Employees_S
===========
  EmployeeNum (PK), HireDate, FirstName, PhoneExt

Employees_V
===========
  EmployeeNum (PK), Effective (PK), IsDeleted, LastName, PayRate, Dept

2016 年 1 月 1 日,我们聘请了 Sally Smith。静态数据被插入到Employees_S 中,生成一个EmployeeNum 值为1001。我们使用该值来插入第一个版本。

Employees_S
===========
  1001, 2016-01-01, Sally, 12345

Employees_V
===========
  1001, 2016-01-01, 0, Smith, 35.00, Eng

3 月 1 日,她获得了加薪:

Employees_S
===========
  1001, 2016-01-01, Sally, 12345

Employees_V
===========
  1001, 2016-01-01, 0, Smith, 35.00, Eng
  1001, 2016-03-01, 0, Smith, 40.00, Eng

5月1日,她结婚了:

Employees_S
===========
  1001, 2016-01-01, Sally, 12345

Employees_V
===========
  1001, 2016-01-01, 0, Smith, 35.00, Eng
  1001, 2016-03-01, 0, Smith, 40.00, Eng
  1001, 2016-05-01, 0, Jones, 40.00, Eng

请注意,同一实体的版本,除了生效日期不能相同的限制外,彼此完全独立。

要查看员工 1001 的当前状态,查询如下:

select  s.EmployeeNum, s.HireDate, s.FirstName, v.LastName, v.PayRate, v.Dept, s.PhoneExt
from    Employees_S s
join    Employees_V v
    on  v.EmployeeNum = s.EmployeeNum
    and v.Effective = ( select  Max( Effective )
                        from    Employees_V
                        where   EmployeeNum = v.EmployeeNum
                            and Effective <= SysDate() )
where   s.EmployeeNum = 1001
    and v.IsDeleted = 0;

这是很酷的部分。要查看员工 1001 的状态,比如 2 月 11 日,查询如下:

select  s.EmployeeNum, s.HireDate, s.FirstName, v.LastName, v.PayRate, v.Dept, s.PhoneExt
from    Employees_S s
join    Employees_V v
    on  v.EmployeeNum = s.EmployeeNum
    and v.Effective = ( select  Max( Effective )
                        from    Employees_V
                        where   EmployeeNum = v.EmployeeNum
                            and Effective <= '2016-02-11' )
where   s.EmployeeNum = 1001
    and v.IsDeleted = 0;

这是同一个查询——除了子查询的最后一行。当前数据和历史数据位于同一张表中,并使用相同的语句进行查询。

这是另一个很酷的功能。现在是 7 月 1 日,我们知道在 9 月 1 日,Sally 将转到营销部门,并再次加薪。文书工作已经完成。继续并插入新数据:

Employees_S
===========
  1001, 2016-01-01, Sally, 12345

Employees_V
===========
  1001, 2016-01-01, 0, Smith, 35.00, Eng
  1001, 2016-03-01, 0, Smith, 40.00, Eng
  1001, 2016-05-01, 0, Jones, 40.00, Eng
  1001, 2016-09-01, 0, Jones, 50.00, Mkt

倒数第二个版本仍将显示为当前版本,但在 9 月 1 日或之后执行的第一个查询将显示营销数据。

是我在技术展览会上做过几次演示的幻灯片。它包含有关如何完成上述所有操作的更多详细信息,包括查询。这是一份更详细的文件

于 2016-08-03T06:11:36.777 回答