1

我有一个数据库表(称为Fields),它有大约 35 列。其中 11 个始终包含大约每 300.000 行的相同常量值 - 并充当元数据。

这种结构的缺点是,当我需要更新这 11 列值时,我需要去更新所有 300.000 行。

我可以将所有常见数据移动到不同的表中,并且只在一个地方更新一次,而不是 300.000 个地方。

但是,如果我这样做,当我显示字段时,我需要INNER JOIN's在两个表之间创建,我知道这会使SELECT语句变慢。

我必须说更新列的发生比读取(显示)数据要少。

你如何建议我应该将数据存储在数据库中以获得最佳性能?

4

4 回答 4

6

我可以将所有常见数据移动到不同的表中,并且只在一个地方更新一次,而不是 300.000 个地方。

即健全的数据库设计和标准规范化。

这不是关于“许多空白字段”,而是关于大量冗余数据的残酷。你应该隔离的常量。单独的表。这也可能使事情变得更快 - 它允许数据库更有效地使用内存,因为您的数据库要小得多。

于 2013-01-28T11:56:03.980 回答
2

我建议使用单独的表格,除非你隐藏了一些重要的东西(当然最好尝试和衡量,但我怀疑你已经知道了)。

您实际上也可以获得更快的选择:加入一个小表会更便宜,然后获取相同的数据 300000 次。

于 2013-01-28T11:51:53.443 回答
0

This is a classic example of denormalized design. Sometimes, denormalization is done for (SELECT) performance, and always in a deliberate, measurable way. Have you actually measured whether you gain any performance by it?

If your data fits into cache, and/or the JOIN is unusually expensive1, then there may well be some performance benefit from avoiding the JOIN. However, the denormalized data is larger and will push at the limits of your cache sooner, increasing the I/O and likely reversing any gains you may have reaped from avoiding the JOIN - you might actually lose performance.

And of course, getting the incorrect data is useless, no matter how quickly you can do it. The denormalization makes your database less resilient to data inconsistencies2, and the performance difference would have to be pretty dramatic to justify this risk.


1 Which doesn't look to be the case here.

2 E.g. have you considered what happens in a concurrent environment where one application might modify existing rows and the other application inserts a new row but with old values (since the first application hasn't committed yet so there is no way for the second application to know that there was a change)?

于 2013-01-28T20:20:37.133 回答
0

最好的方法是分离数据并用这 11 列形成第二个表,并将其称为一些主数据表,它将具有一个主键。

在第一个表的那 30,000 行中,这个主键可以被称为外键

于 2013-02-16T20:28:05.990 回答