3

我正在尝试相互转换一些十进制和二进制数。我正在使用以下格式生成的数据:

Example decimal: 163,   Corresponding binary: 10100011

Binary table key:

在此处输入图像描述

...and the corresponding description for the binary number in question:

在此处输入图像描述

我希望能够获取一个十进制数,将其转换为二进制,然后使用此查找表打印给定十进制的属性列表。我可以使用以下代码将十进制转换为二进制:

sub dec2bin {
    my $str = unpack("B32", pack("N", shift));
    $str =~ s/^0+(?=\d)//;   # otherwise you'll get leading zeros
    return $str;
}

但是然后看不到如何使用查找表。问题是,我有专门设计用于兼容该表的二进制数,例如 1000011、10000011、101110011,但我只是不知道如何使用这些二进制数来提取它们的描述。它们甚至是不同的长度!

有人可以帮我理解这里发生了什么吗?

编辑:这是我发现的另一个查找表......也许这更准确/更有帮助?它看起来和我一模一样,但来自软件的官方网站

在此处输入图像描述

4

3 回答 3

1

任何更简单的方法可能只是检查地图中的每个键并将其直接与转换后的数字进行比较。

sub get_descriptions {
   my $binary_num = shift;
   my @descriptions; 

   for my $k (keys %description_map) {
      # bitwise comparison
      if( $k & $binary_num ) {
         # add description because this bit is set
         push @descriptions, $description_map{$k};
      }
   }

   # full listing of all descriptions for the set bits
   return @descriptions; 
}
于 2013-06-17T17:12:54.353 回答
1

该表以 16 为基数,因此只需转换为以 2 为基数(我从另一个论坛复制/粘贴了该表,如果它与您的屏幕截图不同,请修复):

0000000001 the read is paired in sequencing
0000000010 the read is mapped in a proper pair
0000000100 the query sequence itself is unmapped
0000001000 the mate is unmapped
0000010000 strand of the query (1 for reverse)
0000100000 strand of the mate
0001000000 the read is the first read in a pair
0010000000 the read is the second read in a pair

ETC...

要以您的格式获取正确的描述,则将是以下代码:

my @descriptions = ( 
   "the read is paired in sequencing"
  ,"the read is mapped in a proper pair"
  #...
);
check_number(163); # Note that you don't need to convert to binary :)

sub check_number {
    my $number = shift;
    my $bitmask = 1; # will keep incrementing it by *2 every time
    for($i=0; $i < @descriptions; $i++) {
        my $match = $bitmask & $number ? 1 : 0; # is the bit flipped on?
        print "|$match| $descriptions[$i] | \n";
        $bitmask *= 2; # or bit-shift - faster but less readable.
    }
}

我的测试代码的输出是(对不起,得到了懒惰的复制/粘贴描述字符串,所以伪造了它们):

$ perl5.8 17152880.pl
|1| the read is paired in sequencing |
|1| the read is mapped in a proper pair |
|0| 3 |
|0| 4 |
|0| 5 |
|1| 6 |
|0| 7 |
|1| 8 |
|0| 9 |

如果您只想打印匹配的描述,请将循环中的打印语句更改为 print "$descriptions[$i]\n" if $match;

这种方法的好处是它很容易扩展到更长的描述表

于 2013-06-17T17:14:28.500 回答
1

一旦数字被转换,它在输入中表示的基数就无关紧要了。在内部,将其视为一个数字。

值 163 表示一个位域,即它的每个位都是对一些是非问题的答案,表格告诉你问题是如何排列的。

您可以使用 subs 为这些位提供人类可读的名称,如

sub read_is_paired { $_[0] & 0x0001 }
sub read_is_mapped { $_[0] & 0x0002 }
sub strand_of_mate { $_[0] & 0x0020 }
sub read_is_2nd    { $_[0] & 0x0080 }

然后解码位域类似于

my $flags = 163;
print "read is paired?  ", read_is_paired($flags) ? "YES" : "NO", "\n",
      "read is mapped?  ", read_is_mapped($flags) ? "YES" : "NO", "\n",
      "strand of mate = ", strand_of_mate($flags) ? "1"   : "0",  "\n",
      "read is second?  ", read_is_2nd($flags)    ? "YES" : "NO", "\n";

输出:

读是配对的?是的
读取映射?是的
伴侣链 = 1
读第二?是的
于 2013-06-17T17:17:10.263 回答