sql - 在没有外键的情况下以编程方式提取 RDBMS 中表之间的关系？

Question

我正在对 Oracle 数据库中中等数量的表（50+）之间的关系进行逆向工程，其中表之间没有定义外键。我可以指望（有点）能够跨表匹配列名。例如，列名“SomeDescriptiveName”在整个表集中可能相同。

我想做的是找到一种更好的方法来根据那些匹配的列名提取一些关系，而不是手动逐个遍历表。我可以用 Java DatabaseMetaData 方法做一些事情，但似乎这是有人可能以前必须编写脚本的任务之一。也许使用 Perl 或其他脚本语言提取列名，使用列名作为哈希键并将表添加到哈希键指向的数组中？

有人有任何提示或建议可以使这更简单或提供一个好的起点吗？这是一个丑陋的需求，如果已经定义了外键，那么理解关系会容易得多。

谢谢。

score 1 · Accepted Answer

我的策略是使用 Oracle 系统目录来查找列名和数据类型相同但表名不同的列。还有哪一列是表的主键或唯一键的一部分。

这是一个可能接近于执行此操作的查询，但我没有方便的 Oracle 实例来测试它：

SELECT col1.table_name || '.' || col1.column_name || ' -> ' 
    || col2.table_name || '.' || col2.column_name
FROM all_tab_columns col1 
  JOIN all_tab_columns col2
    ON (col1.column_name = col2.column_name 
    AND col1.data_type = col2.data_type)
  JOIN all_cons_columns cc
    ON (col2.table_name = cc.table_name 
    AND col2.column_name = cc.column_name)
  JOIN all_constraints con
    ON (cc.constraint_name = con.constraint_name 
    AND cc.table_name = con.table_name 
    AND con.constraint_type IN ('P', 'U')
WHERE col1.table_name != col2.table_name;

当然，这不会得到任何相关但名称不同的列的情况。

score 1 · Accepted Answer

你几乎在你的问题中写下了答案。

my %column_tables;
foreach my $table (@tables) {
    foreach my $column ($table->columns) {
        push @{$column_tables[$column]}, $table;
    }
}
print "Likely foreign key relationships:\n";
foreach my $column (keys %column_tables) {
    my @tables = @{$column_tables[$column]};
    next
        if @tables < 2;
    print $column, ': ';
    foreach my $table (@tables) {
        print $table->name, ' ';
    }
    print "\n";
}

score 1 · Accepted Answer

您可以结合使用三种（或四种）方法，具体取决于模式的混淆程度：

动态方法
- 观察：
  - 在 RDBMS（或 ODBC 层）中启用跟踪，然后
  - 在应用程序中执行各种活动（理想情况下创建记录），然后
  - 确定哪些表以严格的顺序被更改，以及使用哪些列值对
  - 在序列间隔期间出现在多于一列中的值可能表示外键关系
静态方法（仅分析现有数据，无需运行应用程序）
- 命名法：尝试从列名推断关系
- 统计：查看所有数字列中唯一值的最小值/最大值（可能还有平均值），并尝试执行匹配
- 代码逆向工程：你最后的手段（除非处理脚本）——不适合胆小的人:)

score 0 · Accepted Answer

这是个有趣的问题。我采用的方法是蛮力搜索与小样本集的类型和值匹配的列。您可能必须调整启发式方法才能为您的模式提供良好的结果。我在不使用自动递增键的模式上运行它并且效果很好。代码是为 MySQL 编写的，但是很容易适应 Oracle。

use strict;
use warnings;
use DBI;

my $dbh = DBI->connect("dbi:mysql:host=localhost;database=SCHEMA", "USER", "PASS");

my @list;
foreach my $table (show_tables()) {
    foreach my $column (show_columns($table)) {
        push @list, { table => $table, column => $column };
    }
}

foreach my $m (@list) {
    my @match;
    foreach my $f (@list) {
        if (($m->{table} ne $f->{table}) &&
            ($m->{column}{type} eq $f->{column}{type}) &&
            (samples_found($m->{table}, $m->{column}{name}, $f->{column}{samples})))
        {
            # For better confidence, add other heuristics such as
            # joining the tables and verifying that every value
            # appears in the master. Also it may be useful to exclude
            # columns in large tables without an index although that
            # heuristic may fail for composite keys.
            #
            # Heuristics such as columns having the same name are too
            # brittle for many of the schemas I've worked with. It may
            # be too much to even require identical types.

            push @match, "$f->{table}.$f->{column}{name}";
        }
    }
    if (@match) {
        print "$m->{table}.$m->{column}{name} $m->{column}{type} <-- @match\n";
    }
}

$dbh->disconnect();

exit;

sub show_tables {
    my $result = query("show tables");
    return ($result) ? @$result : ();
}

sub show_columns {
    my ($table) = @_;
    my $result = query("desc $table");
    my @columns;
    if ($result) {
        @columns = map {
            { name => $_->[0],
              type => $_->[1],
              samples => query("select distinct $_->[0] from $table limit 10") }
        } @$result;
    }
    return @columns;
}

sub samples_found {
    my ($table, $column, $samples) = @_;
    foreach my $v (@$samples) {
        my $result = query("select count(1) from $table where $column=?", $v);
        if (!$result || $result->[0] == 0) {
            return 0;
        }
    }
    return 1;
}

sub query {
    my ($sql, @binding) = @_;
    my $result = $dbh->selectall_arrayref($sql, undef, @binding);
    if ($result && $result->[0] && @{$result->[0]} == 1) {
        foreach my $row (@$result) {
            $row = $row->[0];
        }
    }
    return $result;
}

sql - 在没有外键的情况下以编程方式提取 RDBMS 中表之间的关系？

4 回答 4

Related

Reference