由于您的数据是邻接矩阵,因此相应的 CLUTO 输入文件是所谓的GraphFile,而不是MatrixFile,因此doc2mat
无济于事。
该程序txt2graph.pl
将您的示例“animal.txt”之类的文件转换为图形文件和行标签文件:
#!/usr/bin/perl
@F = split ' ', <>; # begin reading txt file, read column headers
($GraphFile = $ARGV) =~ s/(.txt)?$/.graph/;
$LabelFile = $GraphFile.".rlabel";
open LABEL, ">$LabelFile";
open GRAPH, ">$GraphFile";
print GRAPH $#F+1, "\n"; # output number of vertices=objects=columns=rows
while (<>)
{ # process each object row
@F = split ' ', $_, 2; # split into name, numbers
print LABEL shift @F, "\n"; # output name
print GRAPH @F; # output numbers
}
CLUTO 聚类完成后,该程序pclusters.pl
以您想要的输出格式打印结果:
#!/usr/bin/perl
($LabelFile = $ARGV[0]) =~ s/(.clustering.\d+)?$/.rlabel/;
open LABEL, $LabelFile; chomp(@label = <LABEL>); close LABEL; # read labels
while (<>)
{
$cluster[$_] = [] unless $cluster[$_]; # initialize a new cluster
push $cluster[$_], $label[$.-1]; # add label to its cluster
}
foreach $cluster (@cluster)
{
print "(", join(', ', @$cluster), ")\n"; # print a cluster's labels
}
那么整个过程是:
> txt2graph.pl animal.txt
> scluster animal.graph 2
> pclusters.pl animal.graph.clustering.2