matlab - 在 MATLAB 中实现数据压缩的简单方法？

Question

我正在做一个任务，我必须采用一个包含数据的大矩阵，并以某种方式压缩数据，以便它的大小更易于管理。但是，需要将数据重新用作其他内容的输入。（例如工具箱）。这是我到目前为止所做的。对于这个示例矩阵，我使用 find 函数给我一个包含所有值非零的索引的矩阵。但我不知道如何将其用作输入，以便保留原始图形信息。我很好奇其他人是否有其他更好（简单）的解决方案。

number_1 =     [0 0 0 0 0 0 0 0 0 0 ...
                0 0 1 1 1 1 0 0 0 0 ...     
                0 1 1 0 1 1 0 0 0 0 ...
                0 1 1 0 1 1 0 0 0 0 ...
                0 0 0 0 1 1 0 0 0 0 ...
                0 0 0 0 1 1 0 0 0 0 ...
                0 0 0 0 1 1 0 0 0 0 ...
                0 0 0 0 1 1 0 0 0 0 ...
                0 0 0 0 1 1 0 0 0 0 ...
                0 0 0 0 1 1 0 0 0 0 ...
                0 0 0 0 1 1 0 0 0 0 ...
                0 1 1 1 1 1 1 1 1 0 ...
                0 0 0 0 0 0 0 0 0 0]; 

number = number_1;
compressed_number = find(number);
compressed_number = compressed_number';
disp(compressed_number)

score 1 · Accepted Answer

如果只有 1 和 0，并且填充因子不是特别小，最好的办法是将数字存储为二进制数；如果您需要原始尺寸，请单独保存。我扩展了代码，更清楚地显示了中间步骤，还显示了不同数组所需的存储量。注意 - 我将您的数据重新调整为 13x10 数组，因为它显示得更好。

number_1 = [0 0 0 0 0 0 0 0 0 0 ...
    0 0 1 1 1 1 0 0 0 0 ...
    0 1 1 0 1 1 0 0 0 0 ...
    0 1 1 0 1 1 0 0 0 0 ...
    0 0 0 0 1 1 0 0 0 0 ...
    0 0 0 0 1 1 0 0 0 0 ...
    0 0 0 0 1 1 0 0 0 0 ...
    0 0 0 0 1 1 0 0 0 0 ...
    0 0 0 0 1 1 0 0 0 0 ...
    0 0 0 0 1 1 0 0 0 0 ...
    0 0 0 0 1 1 0 0 0 0 ...
    0 1 1 1 1 1 1 1 1 0 ...
    0 0 0 0 0 0 0 0 0 0];

n1matrix = reshape(number_1, 10, [])'; % make it nicer to display;
% transpose because data is stored column-major (row index changes fastest).

disp('the original data in 13 rows of 10:');
disp(n1matrix);

% create a matrix with 8 rows and enough columns
n1 = numel(number_1);
nc = ceil(n1/8); % "enough columns"
npad = zeros(8, nc);
npad(1:n1) = number_1; % fill the first n1 elements: the rest is zero

binVec = 2.^(7-(0:7)); % 128, 64, 32, 16, 8, 4, 2, 1 ... powers of two

compressed1 = uint8(binVec * npad); % 128 * bit 1 + 64 * bit 2 + 32 * bit 3...

% showing what we did...
disp('Organizing into groups of 8, and calculated their decimal representation:')
for ii = 1:nc
    fprintf(1,'%d    ', npad(:, ii));
    fprintf(1, '=  %d\n', compressed1(ii));
end

% now the inverse operation: using dec2bin to turn decimals into binary
% this function returns strings, so some further processing is needed
% original code used de2bi (no typo) but that requires a communications toolbox
% like this the code is more portable
decompressed = dec2bin(compressed1);
disp('the string representation of the numbers recovered:');
disp(decompressed); % this looks a lot like the data in groups of 8, but it's a string

% now we turn them back into the original array
% remember it is a string right now, and the values are stored
% in column-major order so we need to transpose
recovered = ('1'==decompressed'); % all '1' characters become logical 1
display(recovered); 

% alternative solution #1: use logical array
compressed2 = (n1matrix==1);
display(compressed2);

recovered = double(compressed2); % looks just the same...

% other suggestions 1: use find
compressed3 = find(n1matrix);  % fewer elements, but each element is 8 bytes
compressed3b = uint8(compressed);  % if you know you have fewer than 256 elements

% or use `sparse`
compressed4 = sparse(n1matrix);

% or use logical sparse:
compressed5 = sparse((n1matrix==1));


whos number_1 comp*


the original data in 13 rows of 10:

     0     0     0     0     0     0     0     0     0     0
     0     0     1     1     1     1     0     0     0     0
     0     1     1     0     1     1     0     0     0     0
     0     1     1     0     1     1     0     0     0     0
     0     0     0     0     1     1     0     0     0     0
     0     0     0     0     1     1     0     0     0     0
     0     0     0     0     1     1     0     0     0     0
     0     0     0     0     1     1     0     0     0     0
     0     0     0     0     1     1     0     0     0     0
     0     0     0     0     1     1     0     0     0     0
     0     0     0     0     1     1     0     0     0     0
     0     1     1     1     1     1     1     1     1     0
     0     0     0     0     0     0     0     0     0     0

Organizing into groups of 8, and their decimal representation:
0    0    0    0    0    0    0    0    =  0
0    0    0    0    1    1    1    1    =  15
0    0    0    0    0    1    1    0    =  6
1    1    0    0    0    0    0    1    =  193
1    0    1    1    0    0    0    0    =  176
0    0    0    0    1    1    0    0    =  12
0    0    0    0    0    0    1    1    =  3
0    0    0    0    0    0    0    0    =  0
1    1    0    0    0    0    0    0    =  192
0    0    1    1    0    0    0    0    =  48
0    0    0    0    1    1    0    0    =  12
0    0    0    0    0    0    1    1    =  3
0    0    0    0    0    0    0    0    =  0
1    1    0    0    0    0    0    1    =  193
1    1    1    1    1    1    1    0    =  254
0    0    0    0    0    0    0    0    =  0
0    0    0    0    0    0    0    0    =  0

the string representation of the numbers recovered:
00000000
00001111
00000110
11000001
10110000
00001100
00000011
00000000
11000000
00110000
00001100
00000011
00000000
11000001
11111110
00000000
00000000

compressed2 =

     0     0     0     0     0     0     0     0     0     0
     0     0     1     1     1     1     0     0     0     0
     0     1     1     0     1     1     0     0     0     0
     0     1     1     0     1     1     0     0     0     0
     0     0     0     0     1     1     0     0     0     0
     0     0     0     0     1     1     0     0     0     0
     0     0     0     0     1     1     0     0     0     0
     0     0     0     0     1     1     0     0     0     0
     0     0     0     0     1     1     0     0     0     0
     0     0     0     0     1     1     0     0     0     0
     0     0     0     0     1     1     0     0     0     0
     0     1     1     1     1     1     1     1     1     0
     0     0     0     0     0     0     0     0     0     0


recovered =

     0     0     0     0     0     0     0     0     0     0
     0     0     1     1     1     1     0     0     0     0
     0     1     1     0     1     1     0     0     0     0
     0     1     1     0     1     1     0     0     0     0
     0     0     0     0     1     1     0     0     0     0
     0     0     0     0     1     1     0     0     0     0
     0     0     0     0     1     1     0     0     0     0
     0     0     0     0     1     1     0     0     0     0
     0     0     0     0     1     1     0     0     0     0
     0     0     0     0     1     1     0     0     0     0
     0     0     0     0     1     1     0     0     0     0
     0     1     1     1     1     1     1     1     1     0
     0     0     0     0     0     0     0     0     0     0

  Name              Size             Bytes  Class      Attributes

  compressed1       1x17                17  uint8                
  compressed2      13x10               130  logical              
  compressed3      34x1                272  double          
  compressed3b     34x1                 34  uint8     
  compressed4      13x10               632  double     sparse    
  compressed5      13x10               394  logical    sparse    
  number_1          1x130             1040  double

如您所见，原始数组占用 1040 个字节；压缩数组需要 17。你得到了几乎 64 倍的压缩（不完全是因为 132 不是 8 的倍数）；只有非常稀疏的数据集才能通过其他方式更好地压缩。唯一接近（而且速度非常快）的是

compressed3b = uint8(find(number_1));

在 34 字节时，它绝对是小型数组（< 256 个元素）的竞争者。

注意 - 当您在 Matlab 中保存数据（使用save(fileName, 'variableName')）时，会自动进行一些压缩。这导致了一个有趣且令人惊讶的结果。当您获取上述每个变量并使用 Matlab 将它们保存到文件save时，以字节为单位的文件大小变为：

number_1     195
compressed1  202
compressed2  213
compressed3  219
compressed3b 222
compressed4  256
compressed5  252

另一方面，如果您自己创建一个二进制文件，使用

fid = fopen('myFile.bin', 'wb');
fwrite(fid, compressed1)
fclose(fid)

默认情况下会写入uint8，因此文件大小为 130、17、130、34、34 - 不能以这种方式写入稀疏数组。它仍然显示具有最佳压缩的“复杂”压缩。

score 0 · Accepted Answer

首先，您可以使用该find函数来获取数组的所有非零索引，而不是手动进行。更多信息在这里： http: //www.mathworks.com/help/matlab/ref/find.html

无论如何，您不仅需要matrix原始尺寸，还需要原始尺寸。所以当你传入matrix任何东西时，你也必须传入length(number_1). 这是因为matrix不会告诉您在最后一个 1 之后有多少个 0。您可以通过从原始长度中减去矩阵的最后一个值来计算它（那里可能存在一个错误）。

matlab - 在 MATLAB 中实现数据压缩的简单方法？

2 回答 2

Related

Reference