database - EAV 数据库方案

Question

我有一个超过 100K 记录的数据库。许多类别和许多项目（每个类别具有不同的属性）一切都存储在 EAV 中。

如果我试图打破这个方案并为任何类别创建一个独特的表格，我必须避免吗？

是的，我知道我可能会有很多表，如果我想添加一个额外的字段，我需要更改它们，但是这是错误的吗？

我还读到，我有很多表，数据库将填充更多文件，这对任何文件系统都不利。

有什么建议吗？

score 8 · Accepted Answer

作为数据库设计中的主要结构，该结构将随着数据的增长而失效。您知道数据库模式不适合业务模型的方式是当您需要查询它以进行报告时。EAV 需要许多变通方法和非本地数据库功能才能获得合理的报告。即，即使是最小的查询，您也会不断地创建交叉表/数据透视查询。获取 EAV 并将其放入可查询格式的所有处理都会占用 CPU 周期，并且极易出错。此外，数据的大小呈几何级数增长。如果您有 10 个属性，则标准设计中的 10 行将生成 100 EAV 行。100 个标准行相当于 1000 个 EAV 行，依此类推。

数据库管理系统旨在处理大量表，这不应该是一个担心。

可以创建一个混合解决方案，其中 EAV 结构是解决方案的一部分。但是，规则必须是您永远不能包含查询[AttributeCol] = 'Attribute'。即，您永远不能过滤、排序、限制任何属性的范围。您不能将特定属性放置在报告或屏幕上的任何位置。它只是一团数据。结合系统其余部分的良好架构，拥有一个存储数据块的 EAV 可能很有用。完成这项工作的关键是在您自己和开发人员之间强制执行，永远不要越过对属性进行过滤或排序。一旦你走上黑暗的道路，它将永远主宰你的命运。

score 4 · Accepted Answer

有专门用于运行 EAV 模型的数据库引擎。我不认识他们，所以我不能推荐一个。但是将 EAV 模型推入关系引擎是灾难的根源。灾难会发生，这真的只是时间问题。

您的数据可能会保持足够小，并且您的查询足够简单以使其工作，但这种情况很少发生。

score 3 · Accepted Answer

EAV DB 模式对于添加更多关系数据库的“列”非常灵活，但代价是查询性能下降并丢失了保留在关系数据库模式中的业务逻辑。

因为您必须创建多个视图才能实际旋转结果，如果表包含数十亿行，这将导致性能问题。EAV 模式的另一个性质是，当您将数据表与元数据表连接时，总是会进行查询，并且同一数据表上可能有多个连接。

这是基于我的经验。

score 3 · Accepted Answer

I took this approach on a Authoring System I built for e-learning about 4 years ago. I didn't know I was doing EAV at the time, but I thought I was being all sly just using name/value type pairs. I figured I'd have increased records, but less re-design as I got highly tired of adjusting columns out to the left every time we had a change request.

I did my first test constructing out a hierarchy for the system in one table. Thats performed great with about 4 projects, 25 Products and 4 to 5 tools each all assigned out thru tier integers that link back to their primary keys.

I've been recording assets that pass thru the system, and this meant FLV files, SWF, JPG, PNG, GIF, PDF, MP3 etc ... and all the mime-type specifics about them. This ranges from just 4 to 10 attributes on each file. Its totaled up to 8 million "asset data" records, where as we have about 800K assets (est). I had a request to put all that information into columns for a report. The SQL Statement would have to do a number of table joins on it self, let alone the fact if they want to know the content it was used in, product, or project its just a slew of JOIN's.

From a granular perspective works great. From a Excel report perspective put your seat belt on. I've mitigated it by doing snapshots out to tables that reflect the data the way someone wants in a report, but it takes awhile to compile that information which required me to offload (SQL Dump) to another server.

I've found my self asking if this was the right thing to do, and for this project I could say up to this request for a report on a grand scale "yes". But it makes the server sweat pretty bad correlating it all. Really depends on the deep level of queries they make.

Since I dabble with SQL since 2002 and use it in supporting tools nothing on a huge scale its survived. If it was a larger million person, terabyte+ database I'd be probably pulling my hair out.

Special Note: I found out this system was on RedHat, and it was 32bit. Much of the PHP processing threads were unable to run on more than 1 CPU core, and the server had 7 more cores sitting idle! Queries that were taking up to 45 minutes to run on this machine, actually could run in 14-25 seconds on a 64bit system properly configured. Also food for thought when considering performance.

database - EAV 数据库方案

4 回答 4

Related

Reference