Part 2
Despite it’s disadvantages, using password-based authentication is still the most common way to identify users on the Internet. The biggest drawback of these methods is that they involve the users. People specify their login credentials whenever they register on a site and people are known to be bad in both specifying and memorizing random strings. I once lost my wallet with my credit card and one of the first questions the administrator in the bank asked me was whether I had my PIN written down on a paper in the wallet. The fact that she asked me about this probably means that people often write their PINs down instead of memorizing it. If so many people find it difficult to remember 4 numbers, we can’t expect Internet users to memorize long random passwords with all sorts of characters.
We decided to make some research on one of the password sets publicly available on the Internet. In the previous part of our article, we looked at the length of the passwords in the set and the distribution of the characters used in them. This time, we’ll have a look on an even more interesting topic: we’ll search for patterns people most often use when specifying passwords.
Patterns
Several sites suggest or even require to have a password which contains at least one lowercase letter, one uppercase letter, one digit and one special character. This should make people to use more random passwords, or at least passwords which are less vulnerable to dictionary attacks. However, as we said earlier, people are quite bad in memorizing complicated or random passwords. In addition, we are addicted to patterns, so we will probably try to find a password which contains some sort of scheme which helps us remember the more complicated passwords. A knowledge of these patterns can help the attackers to make their brute-force attacks more effective.
To analyze the dataset, first let’s see an example on how to specify the pattern a password belongs to using the password dontPanic.42 . Let’s make the following modifications on the password:
- replace every lowercase letter of the English alphabet with the character a
- replace every uppercase letter of the English alphabet with the character A
- replace every digit with the character N
- finally, replace every characters which wasn’t replaced by one of the previous rules with the character S. These are the special characters.
We should get the string aaaaAaaaaSNN using the password in our example. This is the pattern which the password belongs to.
The first thing we looked at after we made these modifications on both the control set and the data set is the number of the patterns. We gained 3363 different patterns from the 12864 human-created passwords and almost three times more, 9984 different patterns from the control set with random passwords. This shows that many people use some kind of rule when they construct their passwords.
Above you can see the function we used previously to examine the distribution of the characters, but this time with patterns. In order to define the function, we need to sort the patterns in descending order by the number of the passwords in the dataset which belongs to the specific pattern. After this, let’s choose a password randomly from our dataset and start to iterate through this sorted list of patterns. The function shows the chance whether the pattern of the chosen password has already been iterated or not. The red curve belongs to the dataset, the blue curve shows the function for the control set.
As you can see, about 60% of the passwords belongs to only a few patterns. In fact, the 129 most common pattern contains 60% of all the passwords in the dataset. Attackers can definitely use these patterns to make their attacks more effective.
You can see the 20 most common patterns, the number of the passwords which belongs to them and the rate of their occurrence in the dataset.
Pattern | Number of passwords | Rate of occurrence |
AaSaaaNN | 48 | 0,37 |
ANaaaaaS | 51 | 0,4 |
ASaaaNNN | 53 | 0,41 |
AAaaNNSS | 54 | 0,42 |
NAaaaaaS | 59 | 0,46 |
AaaSaaaN | 62 | 0,48 |
SAaaaaaN | 62 | 0,48 |
AaaaNNSS | 66 | 0,51 |
SAaaaaNN | 67 | 0,52 |
ASaaaaaN | 83 | 0,65 |
SNAaaaaa | 83 | 0,65 |
ASaaaaNN | 155 | 1,21 |
AaaNNNNS | 169 | 1,32 |
AaaaNNNS | 183 | 1,43 |
AaaaSNNN | 286 | 2,23 |
AaaSNNNN | 380 | 2,96 |
AaaaaNNS | 484 | 3,77 |
AaaaaaNS | 600 | 4,68 |
AaaaaSNN | 1114 | 8,68 |
AaaaaaSN | 1119 | 8,72 |
Validation
You may remember that we divided our original set of password into two parts. The first part was used for the experiment and we will use the second part to validate the results and test the knowledge we gained.
We could find 3465 passwords from the validation set using the patterns, the remaining 812 passwords would require a complete brute-force attack. The graph below shows the number of the matched passwords depending on the number of password-candidates needed to test in the worst case.
Note that the number of the required tests is based on worst-case scenarios, that is, we assume that if the password we are searching for belongs to a pattern, it is tested after every other password-candidates which belongs to that pattern. Using the information about the distribution of the characters may lead to a result much faster than what is shown here.
Also, we assumed that there are 58 special characters which can be substituted into the patterns. In a real-world situation, an attacker would probably use less special characters, which would prevent the decryption of the passwords containing rarely used special characters but would make the search much faster.
17% of these passwords requires only 248083.1 millions of tests. You can compare this with a random password which contains 8 lowercase English letters and would require 208827 millions of tests in the worst case – such password is considered weak by most of the sites.
As you can see, more than 50% of the passwords in the validation set was found quite early. This means that if the dataset represents the real-world passwords well, then the naïve brute-force algorithm can be significantly optimized using these patterns.
Use random passwords with a password-management system
In an ideal world, people would use different passwords for every site, all of these passwords would be at least 12 characters long and would contain random characters from every character class we have mentioned before. It would be really important to have one password per site because some of the sites store the passwords as plain text. An attacker can easily acquire even the most complex password from such web applications, which can be used for other sites, if you use the same password.
Sadly, for most of us it is nearly impossible to remember a good-enough password for each site we have registered to. However, we can use a software to memorize our passwords for us. You should only use password-management software which asks for a master password or something similar. For example, saving passwords into an average FTP client would be usually a bad idea, as every information which is needed to gain the password is present on the computer itself. If a master password is required to decrypt the stored passwords, accessing to the computer is not enough to access the passwords.
Inform your users
It seems that some of the sites have already taken patterns into account when telling the strength of a password. You could use a password strength checker that classify the passwords which belong common patterns weaker. Along with that, it is a good idea to let your users know why are their passwords considered weaker, what are the common patterns and why should they avoid them. Generally, I believe it is essential to make the people aware of the importance of choosing the right password.
Random password generator
To help your users choosing the right password, you can place a button next to the password fields on your site, which creates a random password. We have similar functionality on some of our sites and it proved to be useful and successful.
Salted hash
Using patterns for a brute-force attack not only shorten the time required for the attack, but allows the attacker to build a lookup table which is relatively small and effective. After building the lookup table, the attacker only needs to search for the hashes in it to find passwords, which can be done really fast.
To prevent this, it is a good idea to use random salt when hashing passwords.
Let’s see an example! Imagine that you have a web application with 3 users, who have Passwd.1, Passwd.2, Passwd.3 passwords. The hashes of these passwords are stored in the database. If an attacker somehow acquires the hashes and have a precomputed lookup table which contains the hashes of every password which belongs to the pattern AaaaaaSN, he or she only needs to search for the hashes in the table.
In order to prevent this, you can use a random string which can be used to modify the password before creating the hash – this random string will be the salt. For this example, let’s say that the salts for the three password are srfp, qwrm, kzgh respectively and that the salt is added to the end of the password before creating the hash, therefore the hashes will be created for the Passwd.1srfp, Passwd.2qwrn, Passwd.3kzgh strings. This makes the attacker’s lookup table useless, as it only contains the hashes for the unmodified passwords, so he or she would need to use a brute-force attack on every acquired hash. Creating a lookup table for every possible salt would require much more space: even for these simple salts, which contain 4 lowercase English letter, the required space to store the hashes would be 456976 times larger and salts can be much longer and complex.
Share your ideas
There are many questions left in regards to the way people create their passwords. For example, I would be interested whether the language and the culture of the users affect the passwords they use. We’d love to hear your questions and ideas on the topic. Also, if you have any trick you use to create or manage your passwords, don’t hesitate to share it with the world. We are looking forward to hear from you on our Facebook page.