Regular Expressions are both complex and elegant at the same time. They can be made to look like someone was just randomly hammering on their keyboard. They are also an incredibly efficient and elegant solution to describing the structure of text and matching those structures. They are very handy for defining what a string should look like and as such are very good for use in data validation.
To validate a US phone number you might create a simple regular expression \d{3}-\d{3}-\d{4} which will match a phone number like 123-555-1212. Where regular expressions can become difficult though is when the format is not quite as clear cut. What if we need to support (123) 555-1212 and just 555-1212? Well that's where things can get more complex. But this is not about validating phone numbers. In this post I will look at how to make assertions about the complexity of a string, which is very useful if you want to enforce complexity of a user created password.
The key to making password strength validation easy using Regular Expressions is to understand Zero-width positive lookahead assertions (also know as zero-width positive lookaheads). Now that's a mouthful isn't it? Luckily the concept itself is a lot simpler than the name.
Zero-width positive lookahead assertions
Basically a Zero-width positive lookahead assertion is simply an assertion about a match existing or not. Rather than returning a match though, it merely returns true or false to say if that match exists. It is used as a qualification for another match.
The general form of this is:
For example:
- The regular expression z matches the z in the string zorched.
- The regular expression z(?=o) also matches the z in the string zorched. It does not match the zo, but only the z.
- The regular expression z(?=o) does NOT match the z in the string pizza because the assertion of z followed by an o is not true.
Making Assertions About Password Complexity
Now that you know how to make assertions about the contents of a string without actually matching on that string, you can start deciding what you want to actually assert. Remember that on their own these lookaheads do not match anything, but they modify what is matched.
Assert a string is 8 or more characters: (?=.{8,})
Assert a string contains at least 1 lowercase letter (zero or more characters followed by a lowercase character): (?=.*[a-z])
Assert a string contains at least 1 uppercase letter (zero or more characters followed by an uppercase character): (?=.*[A-Z])
Assert a string contains at least 1 digit: (?=.*[\d])
Assert a string contains at least 1 special character: (?=.*[\W])
Assert a string contains at least 1 special character or a digit: (?=.*[\d\W])
These are of course just a few common examples but there are many more that you could create as well.
Applying Assertions to Create a Complexity Validation
Knowing that these make assertions about elements in a string, but not about a match itself, you need to combine this with a matching regular expression to create you match validation.
.* matches zero or more characters. ^ matches the beginning of a string. $ matches the end of a string.
Put together ^.*$ matches any single line (including an empty line). With what you know about Zero-width positive lookahead assertions now you can combine a "match everything" with assertions about that line to limit what is matched.
If I want to match a line with at least 1 lowercase character then I can use: ^.(?=.[a-z]).*$ (Which reads something like: start of string, zero or more characters, assert that somewhere in the string is a lowercase character, zero or more trailing characters, the end of the string)
The part that makes this all interesting is that you can combine any number of assertions about the string into one larger expression that will create your rules for complexity. So if you want to match a string at least 6 characters long, with at least one lower case and at least one uppercase letter you could use something like: ^.(?=.{6,})(?=.[a-z])(?=.[A-Z]).$
And if you want to throw in some extra complexity and require at least one digit or one symbol you could make a match like: ^.(?=.{6,})(?=.[a-z])(?=.[A-Z])(?=.[\d\W]).*$
There you go. Now you can create regular expressions to check the complexity of passwords.
For more help with Regular Expressions, you might want to check out:
[ad name="image-banner"]