sql - 估计 SQL Server 中的表大小

Question

我需要估计先决条件的数据库大小，因此我试图了解 SQL Server 在下面的示例中如何存储数据。

在我的 SQL Server 数据库中，我有一个名为 InfoComp 的表，其中包含 4 行：

IdInfoComp : Integer Not Null (PK)  
IdDefinition : Integer Not Null (FK)  
IdObject : Integer Not Null (FK)  
Value : NVarChar(Max) Not Null

我想估计桌子的大小。在实际使用中，我可以Value通过这个 SQL 查询获得存储的平均长度：

SELECT AVG(Value) FROM InfoComp  
Result : 8

所以，我的计算似乎是（以字节为单位）：

(Size(IdInfoComp) + Size(IdDefinition) + Size(IdObject) + AVG Size(Value)) * Rows count

( 4 + 4 + 4 + ((8 * 2) + 2)) * NbRows

但是当我试图在真实案例中应用这个计算时，这是错误的。就我而言，我有 3,250,273 行，所以结果应该是 92 MB，但是 MS SQL 报告说：

（数据）147 888 KB（索引）113 072 KB 和（保留）261 160 KB。

我哪里错了？

score 3 · Accepted Answer

试试这个……这让我很接近。我使用 msdn 文章创建 . 您可以设置行数。这将执行数据库中的每个表，包括索引。还不做列存储，也不会处理关系。它只会将行数估计应用于每个表。

/*Do NOT change this section*/
GO
CREATE TABLE RowSizes (TypeName VARCHAR(30), TableName VARCHAR(255), IndexName VARCHAR(255), Null_Bitmap SMALLINT, VariableFieldSize BIGINT, FixedFieldSize BIGINT, Row_Size BIGINT, LOBFieldSize BIGINT);
CREATE TABLE LeafSizes (TypeName VARCHAR(30), TableName VARCHAR(255), IndexName VARCHAR(255), Row_Size BIGINT, Rows_Per_Page BIGINT, Free_Rows_Per_Page BIGINT, Non_Leaf_Levels BIGINT, Num_Leaf_Pages BIGINT, Num_Index_Pages BIGINT, Leaf_space_used_bytes BIGINT);
GO
CREATE PROCEDURE dbo.cp_CalcIndexPages
    @IndexType VARCHAR(20)
AS
BEGIN
    DECLARE @IndexName VARCHAR(255)
        , @TableName varchar(255)
        , @Non_Leaf_Levels bigint = 127
        , @Rows_Per_Page bigint = 476 
        , @Num_Leaf_Pages bigint =10000;

    WHILE EXISTS(SELECT TOP 1 1 FROM dbo.LeafSizes WHERE TypeName = @IndexType AND Num_Index_Pages = 0)-- AND IndexName = 'PK_ProcessingMessages')
    BEGIN
        SELECT TOP 1 @IndexName = IndexName
            , @TableName = TableName
            , @Non_Leaf_Levels = Non_Leaf_Levels
            , @Rows_Per_Page = Rows_Per_Page
            , @Num_Leaf_Pages = Num_Leaf_Pages
        FROM dbo.LeafSizes
        WHERE TypeName = @IndexType
            AND Num_Index_Pages = 0;

        DECLARE @Counter INT = 1
            , @Num_Index_Pages INT = 0;

        WHILE @Counter <= @Non_Leaf_Levels
        BEGIN
            BEGIN TRY

            SELECT @Num_Index_Pages += ROUND(CASE WHEN @Num_Leaf_Pages/POWER(@Rows_Per_Page, @Counter) < CONVERT(FLOAT, 1) THEN 1 ELSE @Num_Leaf_Pages/POWER(@Rows_Per_Page, @Counter) END, 0)
            END TRY

            BEGIN CATCH
                SET @Num_Index_Pages += 1
            END CATCH

            SET @Counter += 1
        END

        IF @Num_Index_Pages = 0 
            SET @Num_Index_Pages =  1;

        UPDATE dbo.LeafSizes
        SET Num_Index_Pages = @Num_Index_Pages
            , Leaf_space_used_bytes = 8192 * @Num_Index_Pages
        WHERE TableName = @TableName
            AND IndexName = @IndexName;

    END
END
GO
/*Do NOT change above here*/

--Set parameters here
DECLARE @NumRows INT = 1000000 --Number of rows for estimate
    ,@VarPercentFill money = .6; --Percentage of variable field space used to estimate.  1 will provide estimate as if all variable columns are 100% full.


/*Do not change*/
WITH cte_Tables AS (--Get Tables
    SELECT o.object_id, s.name+'.'+o.name AS ObjectName
    FROM sys.objects o
    INNER JOIN sys.schemas s ON o.schema_id = s.schema_id
    WHERE type = 'U'
), cte_TableData AS (--Calculate Field Sizes
    SELECT o.ObjectName AS TableName
        , SUM(CASE WHEN t.name IN ('int', 'bigint', 'tinyint', 'char', 'datetime', 'smallint', 'date') THEN 1 ELSE 0 END) AS FixedFields
        , SUM(CASE WHEN t.name IN ('int', 'bigint', 'tinyint', 'char', 'datetime', 'smallint', 'date') THEN c.max_length ELSE 0 END) AS FixedFieldSize
        , SUM(CASE WHEN t.name IN ('varchar') THEN 1 ELSE 0 END) AS VariableFields
        , SUM(CASE WHEN t.name IN ('varchar') THEN c.max_length ELSE 0 END)*@VarPercentFill AS VariableFieldSize
        , SUM(CASE WHEN t.name IN ('xml') THEN 1 ELSE 0 END) AS LOBFields
        , SUM(CASE WHEN t.name IN ('xml') THEN 10000 ELSE 0 END) AS LOBFieldSize
        , COUNT(1) AS TotalColumns
    FROM sys.columns c
    INNER JOIN cte_Tables o ON o.object_id = c.object_id
    INNER JOIN sys.types t ON c.system_type_id = t.system_type_id
    GROUP BY o.ObjectName
), cte_Indexes AS (--Get Indexes and size
    SELECT s.name+'.'+o.name AS TableName
        , ISNULL(i.name, '') AS IndexName
        , i.type_desc
        , i.index_id
        , SUM(CASE WHEN t.name IN ('tinyint','smallint', 'int', 'bigint', 'char', 'datetime', 'date') AND c.key_ordinal > 0 THEN 1 ELSE 0 END) AS FixedFields
        , SUM(CASE WHEN t.name IN ('tinyint','smallint', 'int', 'bigint', 'char', 'datetime', 'date') AND c.key_ordinal > 0 THEN tc.max_length ELSE 0 END) AS FixedFieldSize
        , SUM(CASE WHEN t.name IN ('varchar') AND c.key_ordinal > 0 THEN 1 ELSE 0 END) AS VariableFields
        , SUM(CASE WHEN t.name IN ('varchar') AND c.key_ordinal > 0 THEN tc.max_length ELSE 0 END)*@VarPercentFill AS VariableFieldSize
        , SUM(CASE WHEN t.name IN ('xml') AND c.key_ordinal > 0 THEN 1 ELSE 0 END) AS LOBFields
        , SUM(CASE WHEN t.name IN ('xml') AND c.key_ordinal > 0 THEN 10000 ELSE 0 END) AS LOBFieldSize
        , SUM(CASE WHEN t.name IN ('tinyint','smallint', 'int', 'bigint', 'char', 'datetime', 'date') AND c.is_included_column > 0 THEN 1 ELSE 0 END) AS FixedIncludes
        , SUM(CASE WHEN t.name IN ('tinyint','smallint', 'int', 'bigint', 'char', 'datetime', 'date') AND c.is_included_column > 0 THEN 1 ELSE 0 END) AS FixedIncludesSize
        , SUM(CASE WHEN t.name IN ('varchar') AND c.is_included_column > 0 THEN 1 ELSE 0 END)*@VarPercentFill AS VariableIncludes
        , SUM(CASE WHEN t.name IN ('varchar') AND c.is_included_column > 0 THEN tc.max_length ELSE 0 END) AS VariableIncludesSize
        , COUNT(1) AS TotalColumns
    FROM sys.indexes i
    INNER JOIN sys.columns tc ON i.object_id = tc.object_id
    INNER JOIN sys.index_columns c ON i.index_id = c.index_id 
        AND c.column_id = tc.column_id
        AND c.object_id = i.object_id
    INNER JOIN sys.objects o ON o.object_id = i.object_id AND o.is_ms_shipped = 0
    INNER JOIN sys.schemas s ON o.schema_id = s.schema_id
    INNER JOIN sys.types t ON tc.system_type_id = t.system_type_id
    GROUP BY s.name+'.'+o.name, ISNULL(i.name, ''), i.type_desc, i.index_id
)
INSERT RowSizes
SELECT 'Table' AS TypeName
    , n.TableName
    , '' AS IndexName
    , 2 + ((n.FixedFields+n.VariableFields+7)/8) AS Null_Bitmap
    , 2 + (n.VariableFields * 2) + n.VariableFieldSize AS Variable_Data_Size
    , n.FixedFieldSize
    /*FixedFieldSize + Variable_Data_Size + Null_Bitmap*/
    , n.FixedFieldSize + (2 + (n.VariableFields * 2) + (n.VariableFieldSize)) + (2 + ((n.FixedFields+n.VariableFields+7)/8)) + 4 AS Row_Size
    , n.LOBFieldSize
FROM cte_TableData n
UNION
SELECT i.type_desc
    , i.TableName
    , i.IndexName
    , 0 AS Null_Bitmap
    , CASE WHEN i.VariableFields > 0 THEN 2 + (i.VariableFields * 2) + i.VariableFieldSize + 4 ELSE 0 END AS Variable_Data_Size
    , i.FixedFieldSize
    /*FixedFieldSize + Variable_Data_Size + Null_Bitmap if not clustered*/
    , i.FixedFieldSize + CASE WHEN i.VariableFields > 0 THEN 2 + (i.VariableFields * 2) + i.VariableFieldSize + 4 ELSE 0 END + 7 AS Row_Size
    , i.LOBFieldSize
FROM cte_Indexes i
WHERE i.index_id IN(0,1)
UNION
SELECT i.type_desc
    , i.TableName
    , i.IndexName
    , CASE WHEN si.TotalColumns IS NULL THEN 2 + ((i.FixedFields+i.VariableFields+i.VariableIncludes+i.FixedIncludes+8)/8) 
            ELSE 2 + ((i.FixedFields+i.VariableFields+i.VariableIncludes+i.FixedIncludes+7)/8)
        END AS Null_Bitmap
    , CASE WHEN si.TotalColumns IS NULL THEN 2 + ((i.VariableFields + 1) * 2) + (i.VariableFieldSize + 8)
            ELSE 2 + (i.VariableFields * 2) + i.VariableFieldSize 
        END AS Variable_Data_Size
    , CASE WHEN si.TotalColumns IS NULL THEN si.FixedFieldSize
            ELSE i.FixedFieldSize + si.FixedFieldSize
        END AS FixedFieldSize
    /*FixedFieldSize + Variable_Data_Size + Null_Bitmap if not clustered*/
    , CASE WHEN si.TotalColumns IS NULL THEN i.FixedFieldSize + (2 + ((i.VariableFields + 1) * 2) + (i.VariableFieldSize + 8)) + (2 + ((i.TotalColumns+8)/8)) + 7
            ELSE i.FixedFieldSize + (2 + (i.VariableFields * 2) + i.VariableFieldSize) + (2 + ((i.TotalColumns+7)/8)) + 4 
        END AS Row_Size
    , i.LOBFieldSize
FROM cte_Indexes i
LEFT OUTER JOIN cte_Indexes si ON i.TableName = si.TableName AND si.type_desc = 'CLUSTERED'
WHERE i.index_id NOT IN(0,1) AND i.type_desc = 'NONCLUSTERED';

--SELECT * FROM RowSizes

/*Calculate leaf sizes for tables and HEAPs*/
INSERT LeafSizes
SELECT r.TypeName
    , r.TableName
    ,'' AS IndexName
    , r.Row_Size
    , 8096 / (r.Row_Size + 2) AS Rows_Per_Page
    , 8096 * ((100 - 90)/100) / (r.Row_Size + 2) AS Free_Rows_Per_Page
    , 0 AS Non_Leaf_Levels
    /*Num_Leaf_Pages = Number of Rows / (Rows_Per_Page - Free_Rows_Per_Page) OR 1 if less than 1*/
    , CASE WHEN @NumRows / ((8096 / (r.Row_Size + 2)) - (8096 * ((100 - 90)/100) / (r.Row_Size + 2))) < 1 
            THEN 1 
            ELSE @NumRows / ((8096 / (r.Row_Size + 2)) - (8096 * ((100 - 90)/100) / (r.Row_Size + 2))) 
        END AS Num_Leaf_Pages
    , 0 AS Num_Index_Pages
    /*Leaf_space_used = 8192 * Num_Leaf_Pages*/
    , 8192 * CASE WHEN @NumRows / ((8096 / (r.Row_Size + 2)) - (8096 * ((100 - 90)/100) / (r.Row_Size + 2))) < 1 
                THEN 1 
                ELSE @NumRows / ((8096 / (r.Row_Size + 2)) - (8096 * ((100 - 90)/100) / (r.Row_Size + 2))) 
            END + (@NumRows * LOBFieldSize) AS Leaf_space_used_bytes
FROM RowSizes r
WHERE r.TypeName = 'Table'
ORDER BY TypeName, TableName;

/*Calculate leaf sizes for CLUSTERED indexes*/
INSERT LeafSizes
SELECT r.TypeName
    , r.TableName
    , r.IndexName
    , r.Row_Size
    , 8096 / (r.Row_Size + 2) AS Rows_Per_Page
    , 0 AS Free_Rows_Per_Page
    , 1 + ROUND(LOG(8096 / (r.Row_Size + 2)), 0)*(l.Num_Leaf_Pages/(8096 / (r.Row_Size + 2))) AS Non_Leaf_Levels
    , l.Num_Leaf_Pages
    , 0 AS Num_Index_Pages 
    , 0 AS Leaf_space_used_bytes
FROM RowSizes r
INNER JOIN LeafSizes l ON r.TableName = l.TableName AND l.TypeName = 'Table'
WHERE r.TypeName = 'CLUSTERED';

PRINT 'CLUSTERED'
EXEC dbo.cp_CalcIndexPages @IndexType = 'CLUSTERED'

/*Calculate leaf sizes for NONCLUSTERED indexes*/
INSERT LeafSizes
SELECT r.TypeName
    , r.TableName
    , r.IndexName
    , r.Row_Size
    , 8096 / (r.Row_Size + 2) AS Rows_Per_Page
    , 0 AS Free_Rows_Per_Page
    , 1 + ROUND(LOG(8096 / (r.Row_Size + 2)), 0)*(l.Num_Leaf_Pages/(8096 / (r.Row_Size + 2))) AS Non_Leaf_Levels
    , l.Num_Leaf_Pages
    , 0 AS Num_Index_Pages 
    , 0 AS Leaf_space_used_bytes
FROM RowSizes r
INNER JOIN LeafSizes l ON r.TableName = l.TableName AND l.TypeName = 'Table'
WHERE r.TypeName = 'NONCLUSTERED';

PRINT 'NONCLUSTERED'
EXEC dbo.cp_CalcIndexPages @IndexType = 'NONCLUSTERED'

SELECT * 
FROM dbo.LeafSizes
--WHERE TableName = 'eligibility.clientrequest'

SELECT TableName
    , @NumRows AS RowsPerTable
    , @VarPercentFill*100 AS VariableFieldFillFactor
    , SUM(CASE WHEN TypeName = 'Table' THEN Leaf_space_used_bytes ELSE 0 END)/1024/1024 AS TableSizeMB
    , SUM(Leaf_space_used_bytes)/1024/1024 AS SizeWithIndexesMB
FROM LeafSizes
--WHERE TableName = 'eligibility.clientrequest'
GROUP BY TableName
ORDER BY TableName;


GO
/*Cleanup when done*/
DROP PROCEDURE dbo.cp_CalcIndexPages;
DROP TABLE dbo.RowSizes;
DROP TABLE dbo.LeafSizes;

score 1 · Accepted Answer

不幸的是，我不能说为什么你的计算是错误的，因为没有足够的信息来说明表是如何创建的以及数据库是如何配置的。所以我会尝试共同回答，你会有一个小费。

您应该知道的第一件事是任何 SQL Server 数据库的大小都大于或等于model数据库的大小。这是因为model数据库是新数据库的模板，因此每次执行CREATE DATABASE语句时都会复制它。

数据库中的所有信息都存储在磁盘上的 8 KB 页中。页面类型很多。其中一些（如分配映射和元数据）用于内部目的，但其他一些 - 用于存储数据。

表的大小取决于数据在磁盘上的组织方式（是否具有聚集索引）、列类型和数据压缩。索引的大小取决于索引表上是否存在唯一索引、索引的级别计数、填充因子等。

正如我之前所说，一切都存储在页面和数据中。SQL Server 具有用于行内数据的页面、用于行溢出数据的页面和用于 LOB 数据的页面。数据页由三个主要部分组成：页头、数据行和数据偏移数组。

页头占据每个数据页的前 96 个字节，其余 8,096 个字节用于其他组件。行偏移数组是存储在页面末尾的 2 字节条目块。条目计数存储在标头中，称为槽计数。

标题和行偏移数组之间的区域是存储数据行的区域。每行由两个部分组成：固定大小部分和可变长度部分。

数据行的结构是：

状态位 A，1 个字节
状态位 B，1 个字节
固定长度大小（FSize），2字节
固定长度数据 (FData), FSize – 4
列数 (NCol)，2 个字节
NULL 位图，天花板 (NCol / 8)
行中存储的可变长度列数 (VarCount)，2 字节
可变列偏移数组（VarOffset），2 * VarCount
可变长度数据 (VarData)，VarOff[VarCount] - (Fsize + 4 + Ceiling (NCol / 8) + 2 * VarCount)

注意索引行的存储方式与数据行相同。

并非我在这里解释的所有内容，但我希望这将帮助您了解 SQL Server 使用分配空间的目的。此外，您应该记住，数据库文件按FILEGROWTH选项指定的大小增长，这可能导致实际大小比估计的大。

还可以查看Microsoft SQL Server 2012 Internals这本书并阅读如何估计数据库的大小。这对你来说可能会很有趣。

sql - 估计 SQL Server 中的表大小

2 回答 2

Related

Reference