我的任务是编写一个简短的程序来创建一个系统发育树(叶节点中的所有数据)给定一个三元组数据集,形式为(b,c,a),其中a是最不常见的祖先b和c。_
我有一份 Aho、Sagiv、Szymanski 和 Ullman 的论文“Inferring a Tree from Lowest Common Ancestors with an Application to the Optimization of Relational Expressions”(SIAM J. Computing 第 10 卷第 3 期,1981 年 8 月)。我已经完成了算法的描述,但被困在标记为“归纳”的部分,叶子是由它们的共同祖先选择的。在网上查找信息会发现我的页面往往会跳过这部分。我认为这很明显,我错过了什么?
到目前为止我所拥有的(在 perl 中,尚未分解为子例程):
use strict;
use warnings;
use Graph;
use Tree::DAG_NODE;
my @triplets;
while (<>) {
push @triplets, [ split ];
}
#
# In order to generate the G(L) extract the first two
# columns(edges) of @triplets to make the edges of the auxiliary
# graph. Then pass this array of edges to the Graph constructor.
#
my @auxiliary_edges = map { [@$_[0,1]] } @triplets;
my $graph = Graph->new(
undirected => 1,
edges => \@auxiliary_edges
);
my @components = $auxiliary_graph->connected_components;
die "No Tree!" if (@components == 1); # No tree!
my $root = Tree::DAG_Node->new;
$root->name('_ROOT');
#
# Each connected component is a set of leaves.
#
for my $component (@components)
{
print "Component (leaf set): [", join(", ", @$component), "]\n";
#
# To be written: induced(), which is the heart of my problem,
# and add_phylo_tree(), which isn't.
#
my @matches = induced(\@triplets, $component);
add_phylo_tree($root, @matches);
}
print_tree($root); # To be written. I'm using Data::Dumper right now.
exit(0);
数据文件:
b c a
a c d
d e b
搜索(https://www.google.com/search?q=phylogenetic+tree+from+triplets是典型的)到目前为止还没有给我有用的信息,尽管Triplet Methods (PDF) 接近了(我怀疑有一个错误虽然在图表中)。
我还阅读了 Snir 和 Rao 的“Using Max Cut to Enhance Rooted Trees Consistency”,IEEE/ACM Transactions on Computaional Biology and Bioinformatics,vol 3 no。2006 年 10 月至 12 月 4 日),这有助于澄清一些问题,但不是阻碍我的问题。
谢谢你。