c++ - 对高度冗余的数据使用什么压缩算法

Question

该程序使用套接字传输高度冗余的 2D 字节数组（类似图像）。虽然传输速率相对较高（10 Mbps），但阵列也是高度冗余的（例如，每行可能包含几个因此相似的值）。我已经尝试过 zlib 和 lz4，结果很有希望，但是我仍然想到了一种更好的压缩方法，请记住它应该像 lz4 一样相对较快。有什么建议么？

score 4 · Accepted Answer

You should look at the PNG algorithms for filtering image data before compressing. They are simple to more sophisticated methods for predicting values in a 2D array based on previous values. To the extent that the predictions are good, the filtering can make for dramatic improvements in the subsequent compression step.

You should simply try these filters on your data, and then feed it to lz4.

score 1 · Accepted Answer

您可以创建自己的，如果行中的数据相似，您可以创建资源/索引映射，从而大大减少大小，像这样

原始文件：
第1行：1212, 34,45,1212,45,34,56,45,56
第2行：34,45,1212,78,54,87,....

您可以创建一个唯一值列表，而不是使用和索引替换，

34,45,54,56,78,87,1212

第 1 行：6,0,2,6,1,0,......

这可以为您节省超过 30% 或更多的数据传输，但这取决于数据的冗余程度

更新

这里是一个简单的实现

std::set<int> uniqueValues
DataTable my2dData; //assuming 2d vector implementation
std::string indexMap;
std::string fileCompressed = "";

int Find(int value){
  for(int i = 0; i < uniqueValues.size; ++i){
     if(uniqueValues[i] == value) return i;
  }
  return -1;
}

//create list of unique values
for(int i = 0; i < my2dData.size; ++i){
  for(int j = 0; j < my2dData[i].size; ++j){
     uniqueValues.insert(my2dData[i][j]);
  }
}    

//create indexes
for(int i = 0; i < my2dData.size; ++i){
  std::string tmpRow = "";
  for(int j = 0; j < my2dData[i].size; ++j){
     if(tmpRow == ""){ 
       tmpRow = Find(my2dData[i][j]);     
     }
     else{
       tmpRow += "," + Find(my2dData[i][j]);
    }
  }
  tmpRow += "\n\r";
  indexMap += tmpRow;
}

//create file to transfer
for(int k = 0; k < uniqueValues.size; ++k){
  if(fileCompressed == ""){ 
       fileCompressed = "i: " + uniqueValues[k];     
     }
     else{
       fileCompressed += "," + uniqueValues[k];
    }
}
fileCompressed += "\n\r\d:" + indexMap;

现在在接收端你只是做相反的事情，如果行以“i”开头你得到索引，如果它以“d”开头你得到数据

c++ - 对高度冗余的数据使用什么压缩算法

2 回答 2

Related

Reference