2

I'm working on a system that mirrors remote datasets using initials and deltas. When an initial comes in, it mass deletes anything preexisting and mass inserts the fresh data. When a delta comes in, the system does a bunch of work to translate it into updates, inserts, and deletes. Initials and deltas are processed inside long transactions to maintain data integrity.

Unfortunately the current solution isn't scaling very well. The transactions are so large and long running that our RDBMS bogs down with various contention problems. Also, there isn't a good audit trail for how the deltas are applied, making it difficult to troubleshoot issues causing the local and remote versions of the dataset to get out of sync.

One idea is to not run the initials and deltas in transactions at all, and instead to attach a version number to each record indicating which delta or initial it came from. Once an initial or delta is successfully loaded, the application can be alerted that a new version of the dataset is available.

This just leaves the issue of how exactly to compose a view of a dataset up to a given version from the initial and deltas. (Apple's TimeMachine does something similar, using hard links on the file system to create "view" of a certain point in time.)

Does anyone have experience solving this kind of problem or implementing this particular solution?

Thanks!

4

2 回答 2

0

有一个作家和几个读者数据库。您将写入发送到一个数据库,并让它将完全相同的更改传播到所有其他数据库。读者数据库最终将保持一致,并且更新时间非常快。我已经在每天获得超过 100 万次页面浏览量的环境中看到了这一点。它是非常可扩展的。您甚至可以在所有读取数据库的前面放置一个硬件路由器来对它们进行负载平衡。

于 2011-05-13T16:46:15.983 回答
0

感谢那些尝试过的人。

对于最终来到这里的其他人,我正在对一个解决方案进行基准测试,该解决方案将“dataset_version_id”和“dataset_version_verb”列添加到每个有问题的表中。然后,在检索特定记录时,使用存储过程中的相关子查询来检索当前 dataset_version_id。如果最新版本的记录的 dataset_version_verb 为“delete”,它会被 WHERE 子句从结果中过滤掉。

到目前为止,这种方法的平均性能损失约为 80%,这对于我们的目的来说可能是可以接受的。

于 2011-05-16T18:03:55.147 回答