我想通过在关系数据库 (ROLAP) 中使用星型模式来构建一个简单的多维数据模型。为此,我创建了一个事实表和两个维度表。首先,我从操作源复制数据并处理这些数据(一些简化的 ETL 过程)。
在我的模型中只有两个维度:date
和status
。度量:某些状态的数量(一段时间)。
时间维度表:
CREATE TABLE [dbo].[tbl_date_dim] (
[ID][int] IDENTITY(1,1) NOT NULL,
[date_key][int] NOT NULL primary key,
[Year][int] NOT NULL,
[Month][int] NOT NULL,
[Day][int] NOT NULL
);
有一个表——tbl_application
其中存储了整个时间范围(字段VersionDate
)。因此,我这样填写的时间维度表:
INSERT INTO [dbo].[tbl_date_dim]
([date_key],
[Year],
[Month],
[Day])
(
SELECT DISTINCT
CAST(YEAR(VersionDate) as VARCHAR(4)) +
RIGHT('00' + CAST(MONTH(VersionDate) as VARCHAR(2)) ,2) +
RIGHT('00' + CAST(DAY(VersionDate) as VARCHAR(2)), 2) as 'date_key',
YEAR(inner_data.VersionDate) as 'Year',
MONTH(inner_data.VersionDate) as 'Month',
DAY(inner_data.VersionDate) as 'Day'
FROM (
SELECT
VersionDate
FROM [dbo].[tbl_application]
) AS inner_data
);
状态维度表:我使用整个现有表tbl_applicationstatus
。
接下来,我创建一个事实表。它包含维度表和度量的外键。
CREATE TABLE [dbo].[tbl_olap_fact] (
[ID][int] IDENTITY(1,1) NOT NULL,
[status_id][int] NOT NULL, // FK
[date_dim][int] NOT NULL, // FK
[staus_name] varchar(100) NOT NULL, // Non additive measure
[transaction_id][int] NOT NULL, // Additive measure
CONSTRAINT [PK_tbl_olap_fact] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY];
transaction_id
- 这个字段,我将汇总(状态数)。
接下来,我添加事实表和维度表之间的关系:
ALTER TABLE [dbo].[tbl_olap_fact] ADD CONSTRAINT [FK_tbl_olap_fact_tbl_date_dim] FOREIGN KEY([date_dim])
REFERENCES [dbo].[tbl_date_dim] ([date_key]);
ALTER TABLE [dbo].[tbl_olap_fact] ADD CONSTRAINT [FK_tbl_olap_fact_tbl_applicationstatus] FOREIGN KEY([status_id])
REFERENCES [dbo].[tbl_applicationstatus] ([ID]);
然后我填写事实表:
INSERT INTO [dbo].[tbl_olap_fact]
([transaction_id],
[status_id],
[staus_name],
[date_dim])
(
SELECT DISTINCT
core.id as 'transaction_id',
core_status.ID as 'status_id',
core_status.name as 'status_name',
CAST(YEAR(core.VersionDate) as VARCHAR(4)) +
RIGHT('00' + CAST(MONTH(core.VersionDate) as VARCHAR(2)) ,2) +
RIGHT('00' + CAST(DAY(core.VersionDate) as VARCHAR(2)), 2) as 'date_dim'
FROM
[dbo].[tbl_application] as core
inner join tbl_applicationstatus as core_status
on core.ApplicationStatusID = core_status.ID
WHERE IsRaw = 0
);
作为 OLAP 服务器,我使用的是 Mondrian。定义多维数据库逻辑模型的蒙德里安模式:
<Schema name="olap_schema">
<Dimension type="TimeDimension" visible="true" highCardinality="false" name="Date first dim">
<Hierarchy name="date_hierarchy" visible="true" hasAll="true" primaryKey="date_key" description="">
<Table name="tbl_date_dim" schema="dbo">
</Table>
<Level name=""
visible="true"
table="tbl_date_dim"
column="Year"
nameColumn="Year"
type="Numeric"
uniqueMembers="true"
levelType="TimeYears"
hideMemberIf="Never"
description="">
</Level>
<Level name=""
visible="true"
table="tbl_date_dim"
column="Month"
nameColumn="Month"
ordinalColumn="Month"
type="Numeric"
uniqueMembers="false"
levelType="TimeMonths"
hideMemberIf="Never"
description="">
</Level>
<Level name=""
visible="true"
table="tbl_date_dim"
column="Day"
nameColumn="Day"
ordinalColumn="Day"
type="Numeric"
uniqueMembers="false"
levelType="TimeDays"
hideMemberIf="Never"
description="">
</Level>
</Hierarchy>
</Dimension>
<Dimension type="TimeDimension" visible="true" highCardinality="false" name="Date second dim">
<Hierarchy name="date_hierarchy" visible="true" hasAll="true" primaryKey="date_key" description="">
<Table name="tbl_date_dim" schema="dbo">
</Table>
<Level name=""
visible="true"
table="tbl_date_dim"
column="Year"
nameColumn="Year"
type="Numeric"
uniqueMembers="true"
levelType="TimeYears"
hideMemberIf="Never"
description="">
</Level>
<Level name=""
visible="true"
table="tbl_date_dim"
column="Month"
nameColumn="Month"
ordinalColumn="Month"
type="Numeric"
uniqueMembers="false"
levelType="TimeMonths"
hideMemberIf="Never"
description="">
</Level>
<Level name=""
visible="true"
table="tbl_date_dim"
column="Day"
nameColumn="Day"
ordinalColumn="Day"
type="Numeric"
uniqueMembers="false"
levelType="TimeDays"
hideMemberIf="Never"
description="">
</Level>
</Hierarchy>
</Dimension>
<Dimension type="StandardDimension" visible="true" highCardinality="false" name="Status dimension">
<Hierarchy name="status_hierarchy" visible="true" hasAll="true" primaryKey="ID" description="">
<Table name="tbl_applicationstatus" schema="dbo">
</Table>
<Level name=""
visible="true"
table="tbl_applicationstatus"
column="Name"
nameColumn="Name"
type="String"
uniqueMembers="true"
levelType="Regular"
hideMemberIf="Never"
description="">
</Level>
</Hierarchy>
</Dimension>
<Cube name="enrollment_cube" caption="" visible="true" description="" cache="true" enabled="true">
<Table name="tbl_olap_fact" schema="dbo">
</Table>
<DimensionUsage source="Date first dim" name="X axis" caption="" visible="true" foreignKey="date_dim" highCardinality="false">
</DimensionUsage>
<DimensionUsage source="Date second dim" name="Y axis" caption="" visible="true" foreignKey="date_dim" highCardinality="false">
</DimensionUsage>
<DimensionUsage source="Status dimension" name="Z axis" caption="" visible="true" foreignKey="status_id" highCardinality="false">
</DimensionUsage>
<Measure name="TotalCount" column="transaction_id" aggregator="count" caption="Total" visible="true">
</Measure>
</Cube>
</Schema>
作为 OLAP 客户端,我使用的是 Saiku Analytics。
基本上,我得到了正确的数据——但不太确定。例如,我用来填充事实表的方式是否正确?我是否正确构建 ETL 流程?这是一种测试模式,我在构建数据仓库和多维模型方面做了一些实验。