4

I'm curious why the result of SHA256 can be saved within a binary(32), but it needs a varchar(64) for the same result to be saved.

I mean, 256 bits are 32 bytes, so, saving inside a binary(32) makes perfect sense. But then why trying to save it in a varchar requires an extra byte for each byte?

4

1 回答 1

4

Let's start at the beginning and see what a cryptographic function is and what it's output actually is:

A cryptographic hash function is a hash function, that is, an algorithm that takes an arbitrary block of data and returns a fixed-size bit string, the (cryptographic) hash value.

That means that we obtain sequence of 1s and 0s back. In order to save that sequence correctly, you have to use MySQL's binary data-type column since it doesn't save any data about how to represent the saved data to the user - there is no encoding associated with it. That means that when you try to view the data, you'll most likely see garbled characters since the GUI programs will attempt to represent the value stored as an ASCII-encoded string (which is wrong).

I'll skip the reasons why the hash value is represented as a number, but the point is that it is. And it's a hexadecimal number. Let's take the 1st byte that you used:

10101111 = that's decimal 175 or hexadecimal AF.

Sure, you can represent ASCII 175 as something, it will most likely be a weird character depending on the codepage being used. Problem with ASCII is that codes above 127 are arbitrary, which lead to inventing codepages, which lead to inventing Unicode etc. so I'll skip it for now.

Point is, you can't rely on ASCII displaying 10101111 correctly in every scenario. That means that 175 will have to be displayed using 3 bytes, not 1. Why? Because each character in 175 has to be displayed using its own byte.

That means that you can display your hash value as decimal number. That also means you can display your number as a hexadecimal number, which is significantly shorter to represent.

Let's take 10101111 again.

In decimal it's 175, takes 3 bytes to show it on screen - 1 for 1, 1 for 7 and 1 for 5. In hexadecimal it's AF, takes 2 bytes to show it on screen - significantly shorter.

Each byte when translated to hex number has at least 2 digits (there are leading zeroes). With decimal numbers that's not the case, so you know that every time you want to represent 1 byte as a hex number - you'd have at least 2 digits. Ergo, your message is fixed width, it uses digits 0-9, letters A-F which are at the same position in every ASCII code page, ergo they'll look the same.

So when you take AF and display it in ASCII, you need 1 byte for A and 1 byte for F. There are 32 numbers, each has 2 digits, 32x2 = 64 bytes.

The only mistake you probably did was using varchar(64). Using varchar for storing hashes is useless, if you know the hash width. Using char would be much better because you wouldn't waste that 1 byte that varchar column uses.

Hopefully, this clears it up a bit. It's actually more simple than it sounds :)

于 2012-10-02T12:56:23.757 回答