I was going through this post for record level versioning of tables. I noticed that the architecture deals with the usage of history tables. However, my scenario does not require rollback but retrieving point in time records. This is where I have tried with a design on using a single table for versioning. Note that this is a bare bone table data (no constraints, indices, etc.). I intend to index based on id since this involves group by clause on the column.
For example, I have got a table Test where
id is the identifier,
modstamp is the timestamp of the data (never null)
In addition to the columns above, the table will contain bookkeeping columns
local_modstamp is the timestamp at which the record was updated
del_modstamp is the timestamp at which the record was deleted
During backup, all the records are obtained from the source and inserted where the records would have the values local_modstamp = null and del_stamp = null.
id |modstamp |local_modstamp |del_modstamp |
---|---------------------------|---------------|-------------|
1 |2016-08-01 15:35:32 +00:00 | | |
2 |2016-07-29 13:39:45 +00:00 | | |
3 |2016-07-21 10:15:09 +00:00 | | |
Once the records are obtained, these are the scenarios for handling the data (assuming the reference time [ref_time] is the time at which the process is run):
Insert as normal.
Update: Update the most recent record with local_modstamp = ref_time. Then insert the new record. The query would be: update test set local_modstamp = where id = and local_modstamp is not null and del_modstamp is not null insert into test values(...)
Delete: Update the most recent record with del_modstamp = ref_time. update test set del_modstamp = where id = and local_modstamp is not null and del_modstamp is not null
The design aims at getting the latest records where local_modstamp is not null and del_modstamp is not null. However, I ran into an issue where I intend to retrieve point in time using the query (inner-most query):
select id, max(modstamp) from test where modstamp <= <ref_time> and (del_modstamp is null || del_modstamp <= <ref_time>) group by id;
It seems that I have made a mistake (have I?) of using null as a placeholder to identify the latest records of the table. Is there a way to use the existing design to obtain the point in time records?
If not, I guess the probable solution is to set the local_modstamp to the latest records. This would require to update the logic using max(local_modstamp) in case of updates. Can I persist on my existing architecture to achieve in retrieving the point in time data?
I am using SQL-Server right now but this design may be extended to other database products too. I intend to use a more general approach to retrieve the data instead of using vendor specific hacks.