orm - 主键 - 本机、序列或 GUID 键？

Question

在阅读这个和这个然后阅读这个（讽刺地引用其他两个）时，我发现自己想知道这个话题的讨论有多大？我是一名 SQL Server 人员，因此我倾向于使用以 int 形式自动生成的标识。但是，当我知道我需要在服务器和服务器之间进行某种形式的复制或在客户端和服务器之间进行同步时，我倾向于使用 GUID 作为我的密钥。

问题： 我是否应该始终使用 GUID 作为所有表的主键，以防万一我可能需要这种可能的可扩展性？这是否使我的架构更加灵活，因为它可以随时在平台之间迁移？通过不嵌入特定于平台的功能，这是否有助于我保持我的 ORM 的灵活性（无论其风格如何）？

回应：

@David Archer：根据您的评论，我更新了我的帖子，不再说“天然钥匙”。您是正确的，因为自然键是这样定义的。感谢您的指正。

score 4 · Accepted Answer

I tend to prefer application-generated primary keys, typically using the lo/hi algorithm as implemented by NHibernate (when I'm using it on a project). Otherwise, sequential GUIDs work just as well. This isn't just my advice but rather of several folks who have been doing this whole development thing a lot longer than myself.

The problem I see with using DB generated primary keys is that you have to hit the database to get those identity values versus having everything set up before you persist it to the database. It typically breaks the Unit of Work pattern in NHibernate as well due to this fact. If you're not using the UoW pattern in your app, then obviously this drawback doesn't apply.

If you are using GUIDs for your PK, you definitely want to use sequential GUIDs to eliminate index fragmentation. This also gives you the "rough sort order" that another poster mentioned although I'd typically have a DateInserted column or similiar for those types of things.

Joining on a GUID column has been shown to have a fairly minimal performance overhead versus your 4-byte integer and I'd venture to say that for non-large datasets, the performance difference is trivial.

Natural keys are the spawn of the devil. :)

score 3 · Accepted Answer

您可能不应该使用原始 GUID 作为主键。这样做会导致数据大量碎片化。SQL Server 具有为您提供“顺序 guid”以帮助缓解此问题的功能。这里有一个很好的关于这个话题的深入讨论。另一个很好的讨论是在这里...

这表明随机 guid 的碎片量非常重要（建议“碎片百分比”应尽可能接近零）。随机 guid 使用的页数增加了 40%，并且每页上使用的空间量更少，因此所需的磁盘空间会增加。

score 2 · Accepted Answer

I'd avoid GUIDS for Primary Keys unless you know you are really going to need it (i.e. for multi-system synchronization, etc).

In the land of SQL Server replication, a guid is added to rows in replicated tables to achieve uniqueness, so it's quite possible to establish this design later if you have the need.

As to fragmentation, also consider the cost to your disk space. If you are going to be under 10,000 rows (in a table) this is probably not a huge problem but if your system has to support above 10,000 rows (in a table) you'll find performance and disk storage cost (and index frangmentation) is better served by the use of Big Ints (large integers) + identity (autonumber) which scale well to volume.

I'd avoid natural keys altogether - even the risk of logic changing around them makes it too risky IMHO (e.g. if they suddenly become non-unique).

score 2 · Accepted Answer

I support most of the other answerers in saying you should avoid GUIDs as your clustered key in SQL Server - if you really want to, you could use them as primary key, but don't cluster your table on it.

The primary key is the logical concept of a key to uniquely identify each row - here, a GUID can make sense since it's pretty much guaranteed to be unique.

But the clustered key is a physical concept which physically orders the rows in the table, and here due to their random nature, GUIDs are poorly suited. This will lead to massive index fragmentation and thus to poor performance, even if you keep reorganizing your index (and thus table data) over and over again.

Furthermore, since the clustered index key is being used as the lookup value to find the row in the table, it will be added to each and every entry of each and every non-clustered index on your table, too, and here the size of the GUID (16 bytes) vs. INT (4 bytes) comes into play - you potentially waste a lot of space just for keeping track of the lookup values.

The best discussion of primary / clustered indices and GUIDs I know of is the couple of article by Kim Tripp, the Queen of Indexing in SQL Server land - check them out!

Her ultimate requirements for a clustered index are: small, stable, unique, and hopefully ever-increasing. GUID's violate two of those (small and ever-increasing). Even the GUIDs generated by the NEWSEQUENTIALGUID() function in SQL Server aren't totally and truly sequential - so I wouldn't use those either.

Marc

score 1 · Accepted Answer

我已经被“自然键”改变或被复制太多次而无法考虑使用它们。我是否使用序列或 GUID 作为键的决定取决于我是否希望阅读或说出其中之一。

score 1 · Accepted Answer

我对此没有太多经验，但使用 GUID 加入让我感到畏缩。4 个字节对 36 个字节似乎很恶心。

但是，我已经开始使用 GUID 作为公共标识符，而不是身份字段本身。看看上面的 URL，1156712。如果由于某种原因，SO 必须与另一个类似的应用程序（比如 SU）合并，这些问题 id 会发生冲突，或者另一个必须更改它的 URL，从而弄乱任何硬编码链接，并且可能谷歌统计也是如此。而如果公开识别每个元素的方式是通过使用 GUID 并且内部连接使用 int 或 bigint 字段，则您可以两全其美。

使用这种方法仍然可以进行合并。如果发现冲突，可以即时生成新的内部标识符，而不会中断应用程序的其余部分。

orm - 主键 - 本机、序列或 GUID 键？

6 回答 6

Related

Reference