1

I need to store binary files in a varbinary(max) column on SQL Server 2005 like this:

FileInfo

  • FileInfoId int, PK, identity
  • FileText varchar(max) (can be null)
  • FileCreatedDate datetime etc.

FileContent

  • FileInfoId int, PK, FK
  • FileContent varbinary(max)

FileInfo has a one to one relationship with FileContent. The FileText is meant to be used when there is no file to upload, and only text will be entered manually for an item. I'm not sure what percentage of items will have a binary file.

Should I create the second table. Would there be any performance improvements with the two table design? Are there any logical benefits?

I've found this page, but not sure if it applies in my case.

4

3 回答 3

6

没有性能或操作优势。从 SQL 2005 开始,引擎已经将 LOB 类型存储在单独的分配单元(单独的 b 树)中。如果您研究 SQL Server 的表和索引组织,您会发现每个分区最多有 3 个分配单元:数据、LOB 和行溢出:

表组织
(来源:s-msft.com

LOB 字段(varchar(max)、nvarchar(max)、varbinary(max)、XML、CLR UDT 以及不推荐使用的类型 text、ntext 和 image)将在数据记录本身中具有,在聚集索引中,只有占用空间非常小:指向 LOB 分配单元的指针,请参阅记录剖析

通过将 LOB 显式存储在单独的表中,您将一无所获。您只是增加了不必要的复杂性,因为以前的原子更新现在必须将自己分布到两个单独的表中,从而使应用程序和应用程序事务结构复杂化。

如果 LOB 内容是整个文件,那么也许您应该考虑升级到 SQL 2008 并使用FILESTREAM

于 2009-09-25T15:17:00.480 回答
2

There is no real logical advantage to this two-tables design, since the relationship is 1-1, you might have all the info bundled in the FileInfo table. However, there are serious operational and performance advantages, in particular if your binary data is more than a few hundred bytes in size, on average.

EDIT: As pointed out by Remus Rusanu, on some DBMS implementations such as SQL2005, the large object types are transparently stored to a separate table, effectively alleviating the practical drawback of having big records. The introduction of this feature implicitly confirms the the [true] single table approach's weakness.

I merely scanned the SO posting referenced in this question. I generally thing that while that other posting makes a few valid points, such as intrinsic data integrity (since all CRUD actions on a given item are atomic), but on the whole, and unless of relatively atypical use cases (such as using the item table as a repository mostly queried for single items at a time), the performance advantage is with the two tables approach (whereby indexes on "header" table will be more efficient, queries that do not require the binary data will return much more quickly etc. etc.)

And the two tables approach has further benefits in case the design evolves to supply different types of binary objects in differnt context. For example, say these items are images (GIFs, JPGs etc.). At a later date you want to also provide a small preview version of these images (and/or a hi-resolution version), the choice of this being driven by the context (user preference, low band-width clients, subscriber vs. visitor etc.). In such a case not only are the operational issues associated with the single table approach made more acute, the model becomes more versatile.

于 2009-09-25T14:43:53.483 回答
0

纯粹出于 SQL Server 的某些限制,它可以帮助将 IMAGE、(N)TEXT、(N)VARCHAR(max) 和 VARBINARY(max) 列从更宽的表中分离出来。

例如,在 2012 年之前,如果聚簇表包含 LOB,则无法在线重建聚簇表。另一方面,您可能不关心这些限制,因此将表设置为与您的数据相关是更好的做法。

如果您物理上希望将 LOB 数据保留在表分配单元之外,您仍然可以设置“行外的大值类型”表选项

于 2017-03-06T20:08:22.810 回答