algorithm - Are there any known algorithms, open source programs, or white papers for cracking a password given a known portion of the password?

Question

Essentially a a friend of mine locked himself out of an encrypted container he created. He must have made a typo when entering his password, because he cannot access the container. We know what it is supposed to be, and the actual password is sure to be a variation of this and/or very similar to it. I'm looking for code or a white paper dealing with this notion of "fuzzy" password cracking given a known portion of the password or known pattern that the password follows. The language is unimportant. I've already developed a way to brute force the password prompt, what I need is to develop an algorithm that will let me intelligently attempt to crack it so I don't have to try every possible combination. I would think someone else has already done this, however. I understand the concepts behind this to an extent, but am looking for code or a white paper where someone might have already solved this issue.

UPDATE So I've constructed a dictionary (Python, but feel free to send examples in any language) using characters that might be in the passphrase. Considerations were keystroke proximity on a standard QWERTY keyboard, 1337 speak equivalents, and an accidental '/' character for each letter since it is near the shift key. From there the sample pass phrase is supplied, and each letter is tried. This is following this example: http://code.activestate.com/recipes/535171-password-cracker/

import os
from commands import getoutput

known = {
    '_': ('_', ' ', '-', '.', '/'),
    'b': ('b', 'B', '3', '8', '*', 'v', 'V', 'n', 'N', 'g', 'G', 'h', 'H', ' ', '/'),
    'g': ('g', 'G', '6', '^', 'f', 'F', 'h', 'H', 'b', 'B', 'v', 'V', 't', 'T', '/'),
    'l': ('l', 'L', '1', '!', ';', ':', 'k', 'K', 'o', 'O', '.', '>', ',', '<', 'p', 'P', '/'),
    'e': ('e', 'E', '3', '#', '4', '$', 'r', 'R', 'w', 'W', 'd', 'D', '/'),
    'h': ('h', 'H', '4', '$', 'g', 'G', 'j', 'J', 'y', 'Y', 'b', 'B', 'n', 'N', '/'),
    'i': ('i', 'I', '1', '|', '!', '\\', 'u', 'U', 'o', 'O', 'k', 'K', '8', '*', '9', '(', '/'),
    't': ('t', 'T', '7', '&', '+', 'r', 'R', 'y', 'Y', 'g', 'G', '4', '5', '%', '6', '^', '/'),
    'r': ('r', 'R', 'e', 'E', 't', 'T', 'f', 'F', '4', '$', '5', '%', '/'),
}

command = 'open-sesame %s' # hey, use your imagination ;)
# I obviously supplied only needed letters for this example, I can't tip you 
# off to the real pass phrase ;) This conveys the general idea....
passwdBasic = 'Big_Leg_Hitter'

def main():
    arrays = [known[ltr] for ltr in passwdBasic]
    start = [ltrs[0] for ltrs in arrays]
    end = [ltrs[-1] for ltrs in arrays]
    indexes = [0] * len(arrays)
    maxes = [len(ltrs)-1 for ltrs in arrays]
    chrs = [ltrs[i] for ltrs, i in zip(arrays, indexes)]
    while chrs != end:
        passx = ''.join(chrs)
        open('tries.txt', 'a+').write(passx + '\n')
        out = getoutput(command)
        if 'wrong password' not in out:
            print 'GOT IT!', passx
            return
        # Next letter
        for i in range(len(indexes)-1, -1, -1):
            if indexes[i] <= maxes[i]-1:
                indexes[i] += 1
                break
            else:
                indexes[i] = 0
        # Make up the chrs
        chrs = [ltrs[i] for ltrs, i in zip(arrays, indexes)]


if __name__ == '__main__':
    main()

The fictitious 'open-sesame' is a modified utility that mounts this particular type of encrypted volume, it was not written in python but has been made into a command line tool so that this script can interact with it.

A couple of challenges / research directions:

If the '/' character was accidentally hit instead of the shift key, this would actually add a character to the pass phrase and thus could appear before any of the letters. This needs to be accounted for in the solution.
It would be nice to integrate this with the spellchecker utility proposed by @rrenaud: http://norvig.com/spell-correct.html
I'm fascinated by the application of Baye's Theorum of probability statistics which was applied to solve the spellcheck problem; I'm wondering if any research is available on erroneous keystrokes and the probability of hitting certain keys over others when typing certain words and/or phrases. This logic could be applied to password cracking in much the same fashion as the spellcheck utility, which benefits from known lists of common misspellings. I have no erroneous keystroke data with which to "train" a neural net utility.

I appreciate all the great help, I just wanted to share where I'm at so far for everyone's benefit.

score 3 · Accepted Answer

Have you tried enumerating edits from the given known faulty password? If you are only a few edits away (like it would be if it was a typo), there isn't really that many possibilities.

It's enumerating one level of edits is solved by this beautiful code by Norvig for spelling correction in the edits1() function. You could just apply that in a deepening depth first kind of way, so you try single edits first, and then edits of edits, and edits of edits of edits etc.

score 2 · Accepted Answer

First you need your character set. Were special characters a possibility? Upper and lower case? Numbers?
Next you need your original guess. Maybe 2 guesses, the second one would have your upper and lower cases inverted for the chance that the CAPS LOCK key was down. (fuzzY v. FUZZy)
Now you need to iterate all possibilities of... an edit, a delete and an insert.
- You have 3 sets of initial guesses
Now you decide how far you want to take this. Maybe 2 edits, 2 deletes, or 2 inserts... or 1 edit and 1 insert, or 1 delete and 1 insert etc.

Some JavaScript code to illustrate:

var charset = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
var guesses = {"passWORD":true, "PASSword":true};

function getProps(obj) {
 var lst = [];
 for(var g in obj)
  lst.push(g);
 return lst;
}
function transformEdit(guesses) {
 var lst = getProps(guesses);
 for(var x=0; x<lst.length; x++) {
  var guess = lst[x];
  for(var y=0; y<guess.length; y++) {
   for(var z=0; z<charset.length; z++) {
    guesses[guess.slice(0,y) + charset.charAt(z) + guess.slice(y+1)]=true;
   }
  }
 } 
}
function transformDelete(guesses) {
 var lst = getProps(guesses);
 for(var x=0; x<lst.length; x++) {
  var guess = lst[x];
  for(var y=0; y<guess.length; y++) {
   guesses[guess.slice(0,y) + guess.slice(y+1)]=true;
  }
 }
}
function transformInsert(guesses) {
 var lst = getProps(guesses);
 for(var x=0; x<lst.length; x++) {
  var guess = lst[x];
  for(var y=0; y<guess.length+1; y++) {
   for(var z=0; z<charset.length; z++) {
    guesses[guess.slice(0,y) + charset.charAt(z) + guess.slice(y)]=true;
   }
  }
 } 
}

// not the most efficient way
// but you'll get every possible edit, delete and insert
transformDelete(guesses);
transformInsert(guesses);
transformEdit(guesses);

getProps(guesses).length; //1759604

The code doesn't present the MOST EFFICIENT solution since there is a lot of overlap that could of been avoided, but this is to generate a password list for a one shot problem so...

I used the JS object property list as a Hash Set of guesses.

You could output the guesses to a password list by iterating the array returned by getProps(guesses).

algorithm - Are there any known algorithms, open source programs, or white papers for cracking a password given a known portion of the password?

2 回答 2

Related

Reference