It's not the integer overflow that's the problem. That's like driving over lava, having your car explode, and concluding that you didn't buy the right kind of gas.
The real problem is that you can't "crack" hash algorithms. There's one big reason why:
Information Theory
There's a term in information theory known as Shannon entropy. (Fair warning: that article is not easy to get through.) The quick version is that there's a minimum number of bits required to encode any given chunk of information.
This site has a calculator that claims to determine the amount of entropy (i.e. the minimum number of bits) required to losslessly encode a given text. I provided it with this chunk of artisanal filler text:
Meggings forage bitters tofu, Wes Anderson food truck craft beer
iPhone. Single-origin coffee scenester narwhal, mumblecore mlkshk jean
shorts chia trust fund art party pour-over you probably haven't heard
of them bitters. Polaroid craft beer Intelligentsia, vinyl Marfa
Brooklyn umami.
Assuming int
is 32 bits on your system, you only have 32 bits of space to encode any given file. But that chunk above -- not too long compared to what I might have used, like War and Peace or the US Code -- requires 1472 bits at minimum to encode if you want to have any hope of reconstructing the text.
(Commentor templatetypedef points to Kolmogorov complexity (another good explanation of that concept), which is an even better way of representing the information content of a string and the futility of cracking a hash.)
So information-theoretically (and leaving aside cheating, like having a pre-filled compression dictionary), it's impossible to reconstruct those few (simple handcrafted locally-sourced) sentences from a 32-bit int. It's a basic law of the universe, unfortunately. Ain't gonna happen.
Practical example, from code
Another commentor mentions the Pigeonhole Principle -- the simple idea that, if you have N slots (in this case 2^32), you can't put more than N things in them without putting two or more things in the same slot.
Let's take your hash function:
public static int encode(String file) {
int hash = 0;
file = file.toUpperCase();
for(int i = 0; i < file.length(); i++) {
hash = (hash * 61 + file.charAt(i)) - 32;
}
return hash;
}
Specifically, this line:
file = file.toUpperCase();
I want to hash two files:
mary had a little lamb
Mary Had A Little Lamb
What will their hash values be? Think about it.
(Sidenote: Even with all that said, you are overflowing int. :) Modular arithmetic is your friend if you want to do stuff like that.)