My application problem is that, I can get around 500 images, but there might be 1 or 2 of a pair of 2 images are completely the same, this means the files' checksum are the same. My eventual goal is to find out which ones are the repeated image paris.
However now I have to apply a compression algorithm on these 500 images, because the uncompressed images occupy too much disk space. Well, the compression breaks the checksum, so that I cannot use the checksum of the compressed images file to find out which are the repeated image pairs.
Fortunately, my compression algorithm is lossless, this means the restored uncompressed images can still be hashed somehow. But I just want to do this in memory without much disk write access. So my problem is how to efficiently pick up repeated image among large number of images files in memory?
I use opencv often, but the answer will be good as long as it is efficient without saving any file on disk. Python/Bash code will be also acceptable, C/C++ and OpenCV is preferred.
I can think of use OpenCV 's Mat, with std::hash, but std::hash won't work directly, I have to code the std::hash<cv::Mat>
specifically, and I don't know how to do it properly yet.
Of course I can do this,
For each 2 images in all my images:
if ((cv::Mat)img1 == (cv::Mat)img2):
print img1 and img2 are identical
But this is extremely inefficient, basically a n^4 algorithm.
Note my problem is not image similarity problem, it is a hashing problem in memroy.