1

Possible Duplicate:
What is a good way to check if an image is unique using PHP?

A user uploads an image (png, jpg, gif) via a form. I'm am using hash_file to check against the db to see if the image already has been uploaded but I am now noticing that it is not unique.

Is this a bug or should I be using something else to generate a unique ID for the files?

I guess the workaround would be md5(filesize($file) . $hash)?

UPDATE From the logs... first set is using md5_file, second from hash_file with sha256...

HASH: SELECT SiteID FROM tbl_image_hashes WHERE SiteID = 0 AND Hash = 'd41d8cd98f00b204e9800998ecf8427e'
HASH: SELECT SiteID FROM tbl_image_hashes WHERE SiteID = 0 AND Hash = 'd41d8cd98f00b204e9800998ecf8427e'

HASH: SELECT SiteID FROM tbl_image_hashes WHERE SiteID = 0 AND Hash = 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'
HASH: SELECT SiteID FROM tbl_image_hashes WHERE SiteID = 0 AND Hash = 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'
HASH: SELECT SiteID FROM tbl_image_hashes WHERE SiteID = 0 AND Hash = 'e3b0c44298fc1c
20130117T231016: booru.pixymedia.us/utilities/batchExistingUpload.php
HASH: SELECT SiteID FROM tbl_image_hashes WHERE SiteID = 0 AND Hash = 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'
HASH: SELECT SiteID FROM tbl_image_hashes WHERE SiteID = 0 AND Hash = 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'
HASH: SELECT SiteID FROM tbl_image_hashes WHERE SiteID = 0 AND Hash = 'e3b0c44298fc1c

And no the SQL is right... I've uploaded 3,000 files successfully with this function...

This is the hash generating code:

$fileHash = hash_file("sha256",$FILE["tmp_name"]);

$FILE is basically $_FILE, it's just what the function parameter is named as

4

3 回答 3

5

d41d...427ee3b0...b855是空字符串的 MD5 和 SHA256 和(例如md5("")sha256(""))。您在数据库中拥有这些的事实表明您的代码有问题——您可能在某些时候散列了错误的文件名。

于 2013-01-18T05:07:01.250 回答
1

使用图像数据的问题在于,同一图像可以用多种方式表示。对于 GIF 尤其如此,其中颜色表可以按任何顺序排列并且结果是相同的。

您可能应该想出一种方法来散列图像本身。您可以通过读取每个像素的颜色并从中生成某种散列来做到这一点。或者,您可以尝试使用 GD 加载图像,然后让它通过输出图像来“标准化”它imagegd(),然后使用它来检查唯一性。

于 2013-01-18T04:02:31.583 回答
0

如果您为不同的文件获得相同的哈希值,请考虑以下可能性:

  1. 哈希生成不正确(=> 检查输入!<=);或者,
  2. 使用的哈希质量不够(SHA-x 就足够了);或者,
  3. 哈希实现被破坏(值得怀疑,这里不是这种情况);或者,
  4. 他们的文件确实有相同的内容(确定为假)

意外SHA-x 碰撞的几率非常小;这是一个概率表,它不能准确地表明这是多么不可能。这篇关于 160 位哈希的文章在底部具有更可比的规模.. 被流星击中的几率更高!

无论如何,#1 确实是罪魁祸首。

暗示:hash("sha256", "")

于 2013-01-18T04:18:46.000 回答