2

我正在寻找可以获取表格数据并确定它处于哪种正常形式(如果有)并显示任何功能依赖关系等的 Python 代码。

4

1 回答 1

2

There are logical tests for "normalization". However, they're not trivial exercises in programming; they're relationships in the metadata that are imposed on the data. They require "thinking".

1NF -- no repeating groups. How does one identify a "repeating group"? It would be an array structure imposed on the columns of a table. How is that done? SQL doesn't provide a mechanism, so you'd have to look at the column names to check for a "pattern". COL_1, COL_2, COL_3, for example.

1NF -- consistent layout of rows. Duh. SQL imposes this by the very nature of table definition.

2NF -- data in a row depends on the key. You'd have to do a procedure something like this.

For each non-key column:
   Query distinct pairs (Key and the non-key column)
   Do all non-key values depend in a consistent way on a key value?
   Can you build a simple dict mapping non-key to key?

The full algorithm is here: http://en.wikipedia.org/wiki/Relational_model#Set-theoretic_formulation

3NF -- data in a row depends ONLY on the key. This is worse, because you have to compare all combinations of non-key columns against all combinations of non-key columns to be sure that there were no non-key dependencies among the values.

4NF and 5NF confuse me, so I'll stop here.

My point is that -- theoretically -- you can do it. Practically, it's a lot of complex permutations of data to assert that the normal form relationships actually hold.

It's much, much easier to have a hypothesis about a specific violation and probe just that issue with some SQL queries and some thinking.

The formal math is here:

http://en.wikipedia.org/wiki/Relational_model

于 2010-01-28T21:39:05.327 回答