我正在寻找可以获取表格数据并确定它处于哪种正常形式(如果有)并显示任何功能依赖关系等的 Python 代码。
1 回答
There are logical tests for "normalization". However, they're not trivial exercises in programming; they're relationships in the metadata that are imposed on the data. They require "thinking".
1NF -- no repeating groups. How does one identify a "repeating group"? It would be an array structure imposed on the columns of a table. How is that done? SQL doesn't provide a mechanism, so you'd have to look at the column names to check for a "pattern". COL_1, COL_2, COL_3, for example.
1NF -- consistent layout of rows. Duh. SQL imposes this by the very nature of table definition.
2NF -- data in a row depends on the key. You'd have to do a procedure something like this.
For each non-key column:
Query distinct pairs (Key and the non-key column)
Do all non-key values depend in a consistent way on a key value?
Can you build a simple dict mapping non-key to key?
The full algorithm is here: http://en.wikipedia.org/wiki/Relational_model#Set-theoretic_formulation
3NF -- data in a row depends ONLY on the key. This is worse, because you have to compare all combinations of non-key columns against all combinations of non-key columns to be sure that there were no non-key dependencies among the values.
4NF and 5NF confuse me, so I'll stop here.
My point is that -- theoretically -- you can do it. Practically, it's a lot of complex permutations of data to assert that the normal form relationships actually hold.
It's much, much easier to have a hypothesis about a specific violation and probe just that issue with some SQL queries and some thinking.
The formal math is here: