1 featured image 1
Technical Security Insights

Let's Get Cracking: A Beginner's Guide to Password Analysis

February 14, 2019

The Focal Point Attack & Penetration team performs many internal penetration tests that culminate in a compromise of Windows Active Directory domains and access to the password hashes of all domain users. Like many teams that provide pen testing services, we have a high-powered GPU-based password-cracking rig that we use to recover high-value or time-sensitive passwords. But sometimes we’re on-site without access to our VPN or we’re in the the reporting window following a test, and someone is using the rig for an active test. Whatever the reason, we still get a lot of mileage from the classic password cracker, john, even in this age of GPU-based cracking with hashcat (which we also love but is not the focus of this post).

Whenever we compromise a domain and gain access to user passwords, we perform an analysis of the recovered plaintext using Robin Wood's handy password analyzer, pipal, to give our clients an idea of common weaknesses in their users' passwords. Sometimes it's a default initial password like “Today123” that never gets changed; sometimes it's a progression of users moving from "Password1" to "Password2" to "Password3"; sometimes it's finding that many users have the company name in their passwords.

We typically recommend an internal password auditing process to help weed out these weaknesses, but it can be tough to find resources to help our clients get started with password cracking . Most of the state-of-the-art tools in password cracking are focused on GPUs and hashcat, for good reason, but those are overkill for a systems administrator who just wants to eliminate all the “Winter2019” passwords (Happy New Year!). Instead, what they want is a Couch to 5k for password cracking, which is what we aim to provide in this guide.

Setting Up john

john does not require any special hardware. This guide was developed using a plain Xubuntu 18.04 system running in VirtualBox on a standard laptop computer. If you choose to use another Linux distribution or MacOS, you should follow the installation instructions in john's doc/ directory. We do not recommend using the john stable package that is provided in many distributions, as it does not contain many of the features of the Jumbo version maintained at https://github.com/magnumripper/JohnTheRipper. As they say, "The ‘bleeding-jumbo’ branch (default) is based on 1.8.0-Jumbo-1 (but we are literally several thousands of commits ahead of it)."

On Ubuntu-based systems, you will need these packages:

sudo apt-get -y install build-essential git libssl-dev zlib1g-dev yasm libgmp-dev libpcap-dev pkg-config libbz2-dev ocl-icd-opencl-dev opencl-headers pocl-opencl-icd

Create or navigate to an appropriate directory and download john from the GitHub repo:

git clone https://github.com/magnumripper/JohnTheRipper.git

Navigate to the john src/ directory and build it:

./configure && make -s clean && make -sj4

Note that these commands do not perform a system install and instead require that john execute directly from the run/ directory.

Finding and Using Target Hashes

A common issue faced by people who want to learn password cracking is finding target hashes to use. When our clients want to start auditing passwords, it can be difficult to get buy-in from leadership to start downloading user passwords from domain controllers for educational purposes. For this exercise, we leaned on the excellent work of Troy Hunt and his Pwned Passwords database: "Pwned Passwords are 517,238,891 real world passwords previously exposed in data breaches."

One of the services Troy provides is a downloadable file with all these passwords in NT hash format, which administrators can use to blacklist weak passwords in their environments. While that is out of the scope of this guide, we can certainly take advantage of 500+ million NT hashes for learning purposes. Here we use the "ordered-by-count" version of the download, which ranks passwords by prevalence.

Before you get started, make sure you have enough space for the passwords file. The compressed file is 8GB and it expands to 18GB. And then we split it into multiple smaller files.


Download the file to an appropriate directory and then extract:

p7zip -d pwned-passwords-ntlm-ordered-by-count.7z

Screenshot 1

Looking at the first lines of the passwords file, you can see that it is just an NT hash and a count of how many times the password appears in public data breaches.

Screenshot 2

Five hundred million passwords is a little excessive for getting started with password cracking, so we broke the file into chunks of 20,000 lines. This is a realistic size for an Active Directory database.

split -l 20000 pwned-passwords-ntlm-ordered-by-count.txt hashes-

This creates several thousand files preprended with "hashes-". hashes-aa is the top 20,000 (or the worst 20,000) passwords in the database. These are likely to be cracked with little trouble, which will be helpful as you start working with john, as it will allow you to see the success of the various cracking modes. We chose to keep hashes-aa and hashes-zzamhr (the last file, or most unique passwords in the database) and deleted all the others to limit clutter.

Screenshot 3

Finally, before we could start cracking, we needed to massage the password files to make them more useful. A more realistic hashes file would have a username and NT hash separated by a colon.

awk -F: '$0!=""{print "user"NR":"$1}' hashes-aa >> hashes-aa_with_users.txt

awk -F: '$0!=""{print "zuser"NR":"$1}' hashes-zzamhr >> hashes-zzamhr_with_users.txt

These awk commands process the existing file by taking every non-blank line and printing "user" before the line number and appending a colon followed by the NT hash field, resulting in something more useful for our purposes.

Screenshot 4

A nice side effect of this modification is that during cracking, you can see how prevalent a password is based on the user number you’ve assigned.

Time to Get Cracking

Finally, at long last, it’s time to crack some passwords. The first step should always be to run john in "incremental" mode, especially if you suspect a large number of weak passwords, which was the scenario we were in here, using hashes-aa. Incremental mode is not a terribly accurate name, as it uses charsets and some built-in rules to perform some very effective guesses that recover very weak passwords, but it is an effective mode to start with.

~/tools/JohnTheRipper/run/john --format=NT --pot=./pwned.pot --fork=4 --incremental hashes-aa_with_users.txt

A couple of notes on the john command line above. We manually specified the hash type with "--format=NT", and we then used a POT file specific to this session with "--pot=./pwned.pot". The POT file is where john stores passwords that it has already cracked for display with the "--show" command. The "--fork=4" directive tells john to split the work over four CPU cores, and of course "--incremental" specifies the cracking mode.

Screenshot 5

As expected, incremental mode destroys these weak passwords. Clearly, the systems these passwords came from did not have complexity requirements, but that should not make corporate IT administrators with complex password requirements in place feel too comfortable. Our experience is that complexity requirements simply force the users to use "Blueberry1" and "Crystal123" instead, which is not much of an improvement.

Incremental mode will run forever, or until every password is cracked. You could compare this mode to making popcorn. When the passwords stop scrolling, and there start to be pauses between the pops, then you hit Control-c to exit john. You can then use the "--show" directive to view the results.

~/tools/JohnTheRipper/run/john --format=NT --pot=pwned.pot hashes-aa_with_users.txt --show

Screenshot 6

The number of passwords cracked will depend on the hardware you use and the time you let it run. In preparing this guide, we let the incremental run a little over an hour before the pops slowed down to a few per minute. The results were in line with our expectations for the 20,000 worst passwords on the Internet.

Screenshot 7

We wouldn’t be able to count on this level of success with a corporate user population, and there were still 2,822 passwords to crack here, so we moved on to the next cracking mode: using wordlists and permutation rules to guess passwords. It is extraordinarily uncommon to find a user population that generates and uses random strings as passwords. Instead, users rely on passwords they can remember and modify them just enough to meet complexity requirements. For example, someone might want to use their child's name as a password, but then they find that "beyonce" does not meet complexity requirements. But add some capitalization and a birthday, and it works just fine (Beyonce0816). john ships with rules that take a base wordlist and performs common permutations like this to find these types of passwords.

One of the side effects of doing a lot of password cracking is that you’ll get a sense for which wordlists work best, and you can start to build your own wordlists based on past sessions if you are repeating a password audit for the same organization. But here are some very good collections of wordlists on the Internet to get you started:

 We found the top 1,000,000 passwords list to be a good start, so we kicked off with that one.

~/tools/JohnTheRipper/run/john --format=NT --pot=./pwned.pot --fork=4 --rules=all --wordlist=10_million_password_list_top_1000000.txt hashes-aa_with_users.txt

This command introduces the "--wordlist" parameter, which is self-explanatory, and the "--rules=all" parameter, which tells john to use all the rules it ships with. These are worth exploring and reading through as you do more targeted cracking, but for now just enable them all and stop when the popping slows. You can also hit the space bar during a cracking session to see the projected time remaining.

Screenshot 8

The rule modes available in john are:

  • All (Jumbo + KoreLogic)
  • Jumbo (Wordlist + Single + Extra + NT + OldOffice)
  • Extra
  • KoreLogic
  • Loopback (NT + Split)
  • NT
  • OldOffice
  • Single
  • Single+Extra(Single + Extra + OldOffice)
  • Split
  • Wordlist

After a few runs with different wordlists, you can start to generate your own custom wordlist based on the existing results and feed it back through john rules. This is very effective in corporate environments where many people use similar patterns like company name, local sports teams, city names, etc.

~/tools/JohnTheRipper/run/john --format=NT --pot=pwned.pot hashes-aa_with_users.txt --show | cut -d":" -f2- >> pwned-wordlist.txt

One of the tricks to generating a wordlist from john is including passwords that contain colons, like "asdf:lkj". Since the colon is used as an output delimiter, you must tell cut to take the second field to the end of the line to make sure it collects the entire password.

Typically, we run the wordlist mode command using our session wordlist to get any permutations from our existing results. It's a good idea to do this any time a significant number of new passwords is recovered via any methods.

~/tools/JohnTheRipper/run/john --format=NT --pot=./pwned.pot --fork=4 --rules=all --wordlist=pwned-wordlist.txt hashes-aa_with_users.txt

Using Markov Mode

The final mode we wanted to use in this session is based on Markov chains. Markov mode uses statistical analysis of similarities between passwords that have already been cracked to guide password guesses for the remaining hashes. This is most useful in organizations where statistical similarities are most relevant, as users may be getting the same tips from the help desk on how to formulate strong passwords, for example. However, passwords tend to be similar across organizations, and this cracking mode should still generate some results from our 20,000 worst passwords list.

You will need your most recent session wordlist as generated above. Then generate the Markov statistics file:

~/tools/JohnTheRipper/run/calc_stat pwned-wordlist.txt markovstats

~/tools/JohnTheRipper/run/john --format=NT --fork=4 --pot=pwned.pot -markov:300 -max-len:12 hashes-aa_with_users.txt --mkv-stats=markovstats

Screenshot 9

The higher the Markov level and the longer the length, the longer this crack will run. In cases where we are onsite and trying to recover passwords quickly, we usually start at a level of 225 or 250 and hope for a quick win. If time isn't a factor and you are aiming for the highest percentage of passwords, then higher levels will get more.

Screenshot 10

As you can see here, Markov mode reveals some 10- and 11-character passwords that would not be readily available to wordlist+rules cracking. We have been in situations where passwords found via Markov mode were able to be plugged back in via wordlist+rules and still recovered more passwords. In general, password cracking is a highly iterative process where you build on successful cracks to get more and more passwords.

The Results

Over the course of a couple of days, we cracked 19,628 of the top 20,000 most prevalent passwords on the pwned passwords list. Using that same methodology, we cracked 7,211 of the last 18,891. If you are a systems administrator or corporate IT security looking to weed out weak passwords, this methodology should get you started on the right path. If you are new to penetration testing or just looking to add password cracking to your existing toolkit, this should get you comfortable working with password hashes and the iterative process of recovering passwords.

But no matter who you are, you may become addicted to password cracking as you try to get more and more passwords. This obsession will undoubtedly lead you to GPU cracking and hashcat, but that is a post for another day.

Before you start cracking, we want to leave you with a couple of precautions. During this exercise, we used publicly-disclosed breach data for password recovery and password hashes that were not associated with any user or site, so we did not concern ourselves with securing the hashes. If you are retrieving and auditing live passwords for a real organization, you should always take care to secure the data files. Also, make sure that you have permission to audit passwords before pulling live hashes into your own environment. There are many different methods for accessing passwords in Active Directory environments, but that too is a post for another day.

Want more cyber security guides and insights?

Subscribe to Focal Point's Risk Rundown below - a once-a-month newsletter with templates, webinars, interesting white papers, and news you may have missed. Thousands of your colleagues and competitors have signed up! You can unsubscribe at any time.