Let's for instance say I have a Cloud
environment and a Client
environment and I want to sync a large amount of data from the cloud to the client. Let's say I have a db table in the cloud named Files
and i want the exact identical table to exist in the client environment.
Now let assume a few things:
- The files table is very big.
- The data of each row in files can be updated at any time and has a
last-update
column. - I want to fetch the delta's and make sure I am identical in both environments.
My solution:
- I make a full sync first, returning all the entries to the client.
- I keep the
LastSync
time in the client environment and keep syncing delta's from theLastSync
time. - I do the full sync and the delta syncs using paging: the client will fire a first request for getting the
Count
of results for the delta and as many other requests needed by thePage Size
of each request.
For example, the count:
SELECT COUNT(*) FROM files WHERE last_update > @LastSyncTime
The page fetching:
SELECT col1, col2..
FROM files
WHERE last_update > @LastSyncTime
ORDER BY files.id
LIMIT @LIMIT
OFFSET @OFFSET
My problem:
What if for example the first fetch(the Count
fetch) will take some time(few minutes for example) and in this time more entries have been updated and added to the last-update
fetch.
For example:
- The Count fetch gave 100 entries for
last-update 1000 seconds
. - 1 entry updated while fetching the
Count
. - Now the
last-update 1000 seconds
will give 101 entries. - The page fetch will only get 100 entries from the 101 with order by
id
- 1 entry is missed and not synced to the client
I have tried 2 other options:
- Syncing with
from-to
date limit forlast-update
. - Ordering by
last-update
instead of theid
column.
I see issues in both options.