Characters and character classes
The special character "\w" represents any English letter or digit. In contrast, the dot "." represents any character at all, including a hyphen, an apostrophe or an accented letter (like the é in fiancé).
- We can match all nine-letter words that don't include apostrophes, hyphens or accented letters by using ^, the \w nine times, and the $. (Fortunately there is an easier way to do this, as we'll see in the section on Matching repetitions). Using nine dots instead of the \w nine times will also return apostrophized and hyphenated words, as well as those containing accented characters.
^\w\w\w\w\w\w\w\w\w$
| Match all nine-letter words (no hyphens, apostrophes or accented letters)
| arbitrary, aloofness, chronicle, etc.
|
^.........$
| Match all nine-letter words, including hyphens, apostrophes and accented letters.
| hunchback, all-round, fo'c's'le, recherché, etc.
|
....
| Match any word that has at least 4 characters
| The entire dictionary with the exception of those words with three characters or less.
|
Character classes are denoted by square brackets ("[]") and allow us to select a set of possible characters. For example, the character class "[aeiouy]" matches either a, e, i, o, u or y. Note that the special characters mentioned in the Introduction, $v and $c, are simply convenient aliases for character classes:
$v = [aeiouAEIOU]
$c = [bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ] .
- Here are some examples using character classes:
^[bcr]$vt$
| Match any word beginning with either b or c or r, followed by a single vowel and ending with t.
| but, cat, rot, bit, etc.
|
[gst]ion$
| Match words that end in either gion, sion or tion
| legion, prevision, question, etc.
|
^....[gst]ion$
| Match words that begin with any four characters, followed by either g or s or t and ending in ion.
| religion, occasion, vacation, etc.
|
- Within the brackets of a character class, the hyphen (-) is special: it specifies a range, so that if you want to match all lower-case letters you can type
[a-z] rather than the cumbersome [abcdefghijklmnopqrstuvwxyz] . You can make a hyphen part of your character class by placing it first or last.
^[j-n]$v[j-n]$
| Match any word beginning with either j, k, l, m or n followed by any vowel and ending with either j, k, l, m or n.
| jam, kin, lam, men, nun, etc.
|
^[a-h][a-h][a-h][a-h][a-h]$
| Match all five-letter words that include only the letters a through h.
| badge, beach, dacha, etc.
|
- You can negate a character class by putting the character "^" right after the first bracket.
g[^aou]\w$c$
| Match any word that ends in g followed by any letter except a or o or u followed by any character plus a consonant.
| germ, almighty, meaningful, etc.
|
Note: Remember that the character "^" has two special meanings: At the beginning of a string, it means "The word must begin with the following letter" (ex.: "^c" matches all words that begin with c). At the beginning of a character class, it negates all the characters within the brackets.
^g[^aou][m-z]$
| Match any word beginning in g, followed by any letter except a, o or u and ending in any letter from m to z
| gem, gym, get, etc.
|
Top | Regex Dictionary
|