Essentially a a friend of mine locked himself out of an encrypted container he created. He must have made a typo when entering his password, because he cannot access the container. We know what it is supposed to be, and the actual password is sure to be a variation of this and/or very similar to it. I'm looking for code or a white paper dealing with this notion of "fuzzy" password cracking given a known portion of the password or known pattern that the password follows. The language is unimportant. I've already developed a way to brute force the password prompt, what I need is to develop an algorithm that will let me intelligently attempt to crack it so I don't have to try every possible combination. I would think someone else has already done this, however. I understand the concepts behind this to an extent, but am looking for code or a white paper where someone might have already solved this issue.
UPDATE So I've constructed a dictionary (Python, but feel free to send examples in any language) using characters that might be in the passphrase. Considerations were keystroke proximity on a standard QWERTY keyboard, 1337 speak equivalents, and an accidental '/' character for each letter since it is near the shift key. From there the sample pass phrase is supplied, and each letter is tried. This is following this example: http://code.activestate.com/recipes/535171-password-cracker/
import os
from commands import getoutput
known = {
'_': ('_', ' ', '-', '.', '/'),
'b': ('b', 'B', '3', '8', '*', 'v', 'V', 'n', 'N', 'g', 'G', 'h', 'H', ' ', '/'),
'g': ('g', 'G', '6', '^', 'f', 'F', 'h', 'H', 'b', 'B', 'v', 'V', 't', 'T', '/'),
'l': ('l', 'L', '1', '!', ';', ':', 'k', 'K', 'o', 'O', '.', '>', ',', '<', 'p', 'P', '/'),
'e': ('e', 'E', '3', '#', '4', '$', 'r', 'R', 'w', 'W', 'd', 'D', '/'),
'h': ('h', 'H', '4', '$', 'g', 'G', 'j', 'J', 'y', 'Y', 'b', 'B', 'n', 'N', '/'),
'i': ('i', 'I', '1', '|', '!', '\\', 'u', 'U', 'o', 'O', 'k', 'K', '8', '*', '9', '(', '/'),
't': ('t', 'T', '7', '&', '+', 'r', 'R', 'y', 'Y', 'g', 'G', '4', '5', '%', '6', '^', '/'),
'r': ('r', 'R', 'e', 'E', 't', 'T', 'f', 'F', '4', '$', '5', '%', '/'),
}
command = 'open-sesame %s' # hey, use your imagination ;)
# I obviously supplied only needed letters for this example, I can't tip you
# off to the real pass phrase ;) This conveys the general idea....
passwdBasic = 'Big_Leg_Hitter'
def main():
arrays = [known[ltr] for ltr in passwdBasic]
start = [ltrs[0] for ltrs in arrays]
end = [ltrs[-1] for ltrs in arrays]
indexes = [0] * len(arrays)
maxes = [len(ltrs)-1 for ltrs in arrays]
chrs = [ltrs[i] for ltrs, i in zip(arrays, indexes)]
while chrs != end:
passx = ''.join(chrs)
open('tries.txt', 'a+').write(passx + '\n')
out = getoutput(command)
if 'wrong password' not in out:
print 'GOT IT!', passx
return
# Next letter
for i in range(len(indexes)-1, -1, -1):
if indexes[i] <= maxes[i]-1:
indexes[i] += 1
break
else:
indexes[i] = 0
# Make up the chrs
chrs = [ltrs[i] for ltrs, i in zip(arrays, indexes)]
if __name__ == '__main__':
main()
The fictitious 'open-sesame' is a modified utility that mounts this particular type of encrypted volume, it was not written in python but has been made into a command line tool so that this script can interact with it.
A couple of challenges / research directions:
- If the '/' character was accidentally hit instead of the shift key, this would actually add a character to the pass phrase and thus could appear before any of the letters. This needs to be accounted for in the solution.
- It would be nice to integrate this with the spellchecker utility proposed by @rrenaud: http://norvig.com/spell-correct.html
- I'm fascinated by the application of Baye's Theorum of probability statistics which was applied to solve the spellcheck problem; I'm wondering if any research is available on erroneous keystrokes and the probability of hitting certain keys over others when typing certain words and/or phrases. This logic could be applied to password cracking in much the same fashion as the spellcheck utility, which benefits from known lists of common misspellings. I have no erroneous keystroke data with which to "train" a neural net utility.
I appreciate all the great help, I just wanted to share where I'm at so far for everyone's benefit.