4

I am investigating since a few hours the best way to use the Email address instead of username in Django authentication. This topic has been discussed many times but the given results are inconsistent.

1) The answer here points to a snippet that distinguishes the username and email simply by having an '@'char in it. The max length of email and username is not equal though and not considered in the answer.

2) The second answer - from the same link - from S.Lott (13 votes) is doing some black magic with admin.site. It doesn't make sense to me what the code is doing, is this the accepted way of doing it short and sweet?

3) Then I found this solution, which seems almost perfect (and makes sense to me):

username = uuid.uuid4().hex[:30]

It picks only the first 30 chars of a unique Python generated ID as the username. But there is still a chance of collision. Then I came across a post where someone has claimed

A base64 encoding of an md5 hash has 25 characters

If thats true, couldn't we take the base64 encoding of an md5 hash of the email address and guarantee 100% unique usernames, which are also under 30 character? If this is true, how could this be achieved?

Many Thanks,

4

3 回答 3

4

You can do it like this:

>>> from hashlib import md5
>>> h = md5('email@example.com').digest().encode('base64')[:-1]
>>> _
'Vlj/zO5/Dr/aKyJiOLHrbg=='
>>> len(h)
24

You can ignore the last char because it's just a new line. The chance of collision is the same as the MD5 hash, you don't lose information when you encode in base64.

>>> original = md5('email@example.com').digest()
>>> encoded = original.encode('base64')
>>> original == encoded.decode('base64') 
True
于 2012-06-18T19:28:23.843 回答
2

MD5 hashes are always 16 bytes long, and Base64 encodes groups of 3 bytes to 4 characters; thus (16 / 3 rounded up) => 6 groups of 3, times 4 = 24 characters for a MD5 hash encoded to Base64.

However, note that the above linked Wikipedia page states:

However, it has since been shown that MD5 is not collision resistant.

So you cannot count on this method giving you unique usernames from email addresses. Producing them is very easy with the help of the hashlib module:

>>> from hashlib import md5
>>> md5('foo@bar.com').digest().encode('base64').strip()
'862kBc6JC2+CBAlN6xLYqA=='
于 2012-06-18T19:42:26.287 回答
2

UUID is 128bit, thus you could apply base64 on it directly to get a 22-chars long string (by removing fixed padding '==', as Gumbo suggests in comments of the question)

>>> import base64
>>> len(base64.urlsafe_b64encode(uuid.uuid4().bytes).rstrip('='))
22

Here, the urlsafe_b64encode and the stripping of '=' are used to avoid chars that User.username field does not like, including '/' '+' and '='

Also, UUID has two bits of fixed '10'(hence the 17th char in the hex representation is always 8,9,A,B) and four bits of versions, check the wiki.
Thus you could throw away the 4+2=6bits along w/ 2 effective bits to get a 30-chars long hex string:

>>> s = uuid.uuid4().hex
>>> len(s[:12] + s[13:16] + s[17:])
30

In this way you only throw away 2 effective bits instead of 8 when simply slicing s by s[:30], and you could expect better uniqueness (1/4 coding space of uuid at most).

于 2012-06-19T06:35:30.637 回答