2

我在数据库中有很多数字。例如,

448-48-00 #(from 00 to 99, 100 numbers)
336-87-00 #(same as above)
449-20-00 #(from 000 to 999, 1000 numbers)

我需要得到这些数字的基数。对于此示例,我需要获取 44848、33687 和 4492。

我有这段代码,但我不知道如何完成它:)

#!/usr/bin/perl

use v5.10;
use warnings;

my @p = 4484900..4484999;
push @p, $_ for 3368700..3368799;

my $data;

do {
    my $z = 1;
    while($z++ <= length $_) {
        $data->{substr $_, 0, $z}++;
    }
} for @p;

foreach my $key (sort { $data->{$a} <=> $data->{$b} } (keys %$data)) {
    say $key if $data->{$key} > 99;
}

我需要获取最长的元素并删除包含它的最长代码的短元素

4

2 回答 2

3
#!/usr/bin/env perl -l

use strict; use warnings;

my $prefix = "1234";

foreach (<DATA>) {
    print $prefix . $1 . $2 if m/^(\d{3})-(\d{1,2})/;
}

__DATA__
448-48-## (00-99)
336-87-## (-||-)
449-2#-## (0-9, 00-99)

输出

123444848
123433687
12344492

如果您只想要更高的价值:

#!/usr/bin/env perl -l

my @arr;
my $prefix = "1234";
my $higher_prefix = 0;

foreach (<DATA>) {
    my $cur = $1 . $2 if m/^(\d{3})-(\d{1,2})/;
    $higher_prefix = $prefix . $cur if $cur > $higher_prefix;
}

print $higher_prefix;

__DATA__
448-48-## (00-99)
336-87-## (-||-)
449-2#-## (0-9, 00-99)

输出

123444848
于 2012-11-21T07:26:26.423 回答
1

我试图了解您在代码中所做的事情并对其进行改进以做您想做的事情。免责声明:这不是那么简单,例如,算法无法看到您不想分组44848..并且4492...要分组,44.....而是要分组4492...而不是44924..等等。但也许这已经可以帮助你了。

I think the important part is the "smart filter" which for example looks at 336 and 3368 and deletes the count of 336 if it isn't higher than the other (336 marks a trivial super set of 3368). Important here is the string-sort together with the state variable $last:

#!/usr/bin/env perl

use strict;
use warnings;
use feature qw(say state);
use List::Util 'shuffle';

# shuffled phone numbers (don't make it too easy)
my @numbers = shuffle (
    4484800 .. 4484899,
    3368700 .. 3368799,
    4492000 .. 4492999
);

my %count = ();

# import phone numbers
foreach my $number (@numbers) {

    # work on all substrings from the beginning
    for (my $pos = 1; $pos <= length $number; $pos++) {
        my $prefix = substr $number, 0, $pos;
        $count{$prefix}++; # increase the number of equal prefixes
    }
}

# smart filter
foreach my $prefix (sort {$a cmp $b} keys %count) {
    state $last //= 'nothing';

    # delete trivial super sets
    if ($prefix =~ /^\Q$last/ and $count{$last} == $count{$prefix}) {
        delete $count{$last};
    }

    # delete trivial sets
    if ($count{$prefix} == 1) {
        delete $count{$prefix};
        next;
    }

    # remember the last prefix
    $last = $prefix;
}

# output
say "$_ ($count{$_})" for sort {
    $count{$b} <=> $count{$a} or $a cmp $b
} keys %count;

The output is absolutely right but not yet what you want:

44 (1100)
4492 (1000)
33687 (100)
44848 (100)
44920 (100)
44921 (100)
44922 (100)
44923 (100)
44924 (100)
44925 (100)
44926 (100)
44927 (100)
44928 (100)
44929 (100)
336870 (10)
(large list of 10-groups)

So if you want to get rid of the 10-groups, you could change

# delete trivial sets
if ($count{$prefix} == 1) {
    delete $count{$prefix};
    next;
}

to

# delete trivial sets
if ($count{$prefix} <= 10) {
    delete $count{$prefix};
    next;
}

Output:

44 (1100)
4492 (1000)
33687 (100)
44848 (100)
44920 (100)
44921 (100)
44922 (100)
44923 (100)
44924 (100)
44925 (100)
44926 (100)
44927 (100)
44928 (100)
44929 (100)

This looks very good. Now it's up to you what to do with the 4492-100-groups and the 44-1100-group. If you want to delete the 100-groups depending on their length, that could also delete the 4492 group in favor of the large 44 group.

于 2012-11-21T10:08:41.547 回答