visca.com | Regex Dictionary

Step-by-step tutorial for using the Regex Dictionary

Introduction

If you are familiar with Perl's regular expressions, then you can simply search for matches in our dictionary by entering any valid regular expression. Note that the syntax for patterns in PHP's regular expressions closely resembles Perl's.

The Regex Dictionary also makes use of two special characters: $v represents all upper and lower-case vowels (the character set [aeiouAEIOU]) and $c, all upper and lower-case consonants (the character set [bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ].

If you are new to the world of regular expressions, then the easiest way to learn to make the most of the Regex Dictionary is by example. What follows is a step-by-step tutorial. To begin, we'll simply search for a string in all gramatical categories without filtering the results.

Top | Regex Dictionary



Matching words or parts of words

  1. A search for the string "dictionary" means "Find any word containing these ten consecutive letters". It returns only one result: "dictionary".
  2. A search for the string "cat" means "Find any word containing these three consecutive letters". Hundreds of matches are returned, including "catastrophic", "scathing", "catch", etc.

Top | Regex Dictionary



Matching at the beginning or end of words

The characters ^ and $ are called anchors; they tell the program where to match the string: at the beginning of the word (^) or at the end of the word ($).

  1. We can match all words ending in "cat" by putting the anchor "$" after our search string.
    cat$ Match any word that ends in cat. bearcat, wildcat, cat, etc.
  2. We can match all words beginning in "cat" by putting the anchor "^" before our search string. Logically, then, using both anchors will limit the matches to the word cat:
    ^cat Match any word that begins in cat. catholic, category, cattle, catch, etc.
    ^cat$ Match any word that begins and ends in cat. cat
  3. As we mentioned in the Introduction, with the Regex Dictionary you can use the special character $v to match any vowel and the character $c to match any consonant.
    ^c$vt$ Match any word that begins with c followed by a single vowel and ending in t. cat, cot and cut
    ^c$v$c Match any word beginning with c followed by a single vowel and ending in any single consonant. cab, cod, cut, etc.
    ^c$v$v$c$ Match any word beginning with c followed by two vowels and ending in any single consonant. cool, coal, coin, coax, etc.
  4. Note that, in the examples in the previous section, if we remove the initial anchor ^, we'll match any word ending in "c + 2 vowels + 1 consonant"; if we remove only the final anchor $, we'll match any word beginning "c + 2 vowels + 1 consonant", and removing both anchors yields all words that contain the string.
    c$v$v$c$ Match any word that ends in c + 2 vowels + a single consonant. oficial, raincoat, scoot, etc.
    ^c$v$v$c Match any word that begins in c followed by 2 vowels and a consonant. coarse, cause, coax, etc.
    c$v$v$c Match any word that contains a c followed by 2 vowels and a consonant. enforceable, uncousinly, musicianship, etc.

Top | Regex Dictionary



Characters and character classes

The special character "\w" represents any English letter or digit. In contrast, the dot "." represents any character at all, including a hyphen, an apostrophe or an accented letter (like the é in fiancé).

  1. We can match all nine-letter words that don't include apostrophes, hyphens or accented letters by using ^, the \w nine times, and the $. (Fortunately there is an easier way to do this, as we'll see in the section on Matching repetitions). Using nine dots instead of the \w nine times will also return apostrophized and hyphenated words, as well as those containing accented characters.
    ^\w\w\w\w\w\w\w\w\w$ Match all nine-letter words (no hyphens, apostrophes or accented letters) arbitrary, aloofness, chronicle, etc.
    ^.........$ Match all nine-letter words, including hyphens, apostrophes and accented letters. hunchback, all-round, fo'c's'le, recherché, etc.
    .... Match any word that has at least 4 characters The entire dictionary with the exception of those words with three characters or less.

Character classes are denoted by square brackets ("[]") and allow us to select a set of possible characters. For example, the character class "[aeiouy]" matches either a, e, i, o, u or y. Note that the special characters mentioned in the Introduction, $v and $c, are simply convenient aliases for character classes:
$v = [aeiouAEIOU]
$c = [bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ].

  1. Here are some examples using character classes:
    ^[bcr]$vt$ Match any word beginning with either b or c or r, followed by a single vowel and ending with t. but, cat, rot, bit, etc.
    [gst]ion$ Match words that end in either gion, sion or tion legion, prevision, question, etc.
    ^....[gst]ion$ Match words that begin with any four characters, followed by either g or s or t and ending in ion. religion, occasion, vacation, etc.
  2. Within the brackets of a character class, the hyphen (-) is special: it specifies a range, so that if you want to match all lower-case letters you can type [a-z] rather than the cumbersome [abcdefghijklmnopqrstuvwxyz]. You can make a hyphen part of your character class by placing it first or last.
    ^[j-n]$v[j-n]$ Match any word beginning with either j, k, l, m or n followed by any vowel and ending with either j, k, l, m or n. jam, kin, lam, men, nun, etc.
    ^[a-h][a-h][a-h][a-h][a-h]$ Match all five-letter words that include only the letters a through h. badge, beach, dacha, etc.
  3. You can negate a character class by putting the character "^" right after the first bracket.
    g[^aou]\w$c$ Match any word that ends in g followed by any letter except a or o or u followed by any character plus a consonant. germ, almighty, meaningful, etc.

    Note: Remember that the character "^" has two special meanings: At the beginning of a string, it means "The word must begin with the following letter" (ex.: "^c" matches all words that begin with c). At the beginning of a character class, it negates all the characters within the brackets.

    ^g[^aou][m-z]$ Match any word beginning in g, followed by any letter except a, o or u and ending in any letter from m to z gem, gym, get, etc.

Top | Regex Dictionary



Alternatives: Matching this or that

The special character "|" allows us to search for two or more alternatives. Examples:

speak|talk Match any word that includes the words talk or speak. doublespeak, talky, trash-talk, etc.
hand|mind|eye Match any word that includes the words hand, mind or eye. absentminded, handicapped, cross-eyed, eyedrops, etc.

Top | Regex Dictionary



Groupings

We can group our alternatives using parentheses.

  1. Here are some examples of groupings:
    (hand|mind)ed$ Match any word that ends in either handed or minded. empty-handed, narrow-minded, etc.
    (hand|mind|ey)ed$ Match any word that ends in either handed, minded or eyed. Those listed above, plus eagle-eyed, hackneyed, etc.
    ^(g|q)u Match any word beginning in either gu or qu. guitar, quarter, etc.
    (care|fear|harm) Match any word that includes the words care, fear or harm. scare, fearless, pharmacy, etc.
  2. Note that groupings can be nested: you can search for alternatives within alternatives:
    b(e(e|a))r Match any word that contains a b followed by either eer or ear. bear, beer, overbear, etc.
    t(ian|en(d|t))$ Match any word ending in either tian, tend or tent. dalmatian, content, extend, etc.
    ^(p|q)$v$v(b|r|t)$ Match any word that begins with either p or q, followed by 2 vowels, and ending in either b, r or t. poor, pear, poet, quit, etc.

Top | Regex Dictionary



Matching repetitions

The quantifier metacharacters ?, *, +, and {} allow us to specify how many times a character or group of characters can be repeated:

a? Match a 1 or 0 times.
a+ Match a 1 or more times (at least once).
a* Match a 0 or more times (the a is optional, or it can appear any number of times).
a{n,m} Match a at least n times, but not more than m times.
a{n,} Match a at least n or more times.
a{n} Match a exactly n times.

Here are some simple examples of the use of quantifiers:

^bea?$c$ Match any word beginning with either be or bea and ending with a consonant. bead, beak, bet, beg, etc.
hop*e Match any word including hoe, hope or hoppe whoever, hopeful, orthopedics, shopper, etc.
.*'.*'.* Match any word containing two or more apostrophes. 'tain't, bo's'n, rock'n'roll and fo'c's'le
a$c{4,6} Match any word containing an a followed by either 4, 5 or 6 consonants. anything, scratchproof, bathysphere, etc.
\w{12,} Match any word with 12 letters or more. absentminded, conversational, interdisciplinary, psychopharmacological, etc.
^\w{12}$ Match all 12-letter words freewheeling, contribution, weatherstrip, etc.

Top | Regex Dictionary



Backreferences: Matching what you've already matched (Capturing)

Parentheses (see Groupings) also allow the extraction of the parts of a string that matched. The backreference character \1 will "capture" whatever was matched in the regex's first parentheses group. Perhaps this is best understood using simple examples.

  1. Examples of the use of backreferences to capture matches:
    h(\w)\1 Match any word containing an h followed by the same letter twice. hook, sheep, shiitake, nighttime, etc.
    $c($c)\1 Match any word containing a single consonant followed by a doubled consonant. idyllic, dumbbell, hitchhiker, withholding, etc.
    ($c)\1(ing|ed)$ Match any word ending doubled-consonant + ed or doubled-consonant + ing. fitting, unerring, matted, blessed, etc.
    ([aiu])\1 Match any word that contains aa, ii or uu. bazaar, radii, vacuum, etc.
    ([jkvwxy])\1 Match any word that contains jj, kk, vv, ww, xx or yy. knickknack, savvy, glowworm, sayyd, etc.
  2. In the same way that \1 will capture the first parenthesized match, \2 will capture the second, \3 will capture the third, etc. Examples:
    ^(g|q)\w($v)\2 Match any word beginning with g or q followed by any letter followed by a doubled vowel (\2 captures the vowel). gloomy, greeting, queen, etc.
    ^($c)($v)\1\2 Match any word beginning with a consonant and a vowel followed by the same consonant plus the same vowel (\1 captures the consonant, \2 captures the vowel). nonobjective, papa, cocoon, etc.
    ^($c)($v)\2\1 Match any word beginning with a consonant and a vowel followed by the same vowel plus the same consonant (\2 captures the vowel, \1 captures the consonant). noon, seesaw, tooth, etc.
    ($c$c$v)\1 Match any word containing the same consonant-consonant-vowel sequence twice. commonsense, discontented, alfalfa, Mississippi, etc.
    (\w)\1(\w)\2(\w)\3 Match all words that have three consecutive doubled letters. Only bookkeeping.
    ^(\w)(\w)\w?\2\1$ Match all 4 or 5 letter words that are spelled the same backwards and forwards. civic, deed, level, noon, peep, radar, rotor, etc.

    Top | Regex Dictionary


Filtering the results

On the Regex Dictionary search page, beneath the String field, there is a field called Filter String. This field works with exactly the same regular expressions as the field above it, but to opposite effect: whatever matches the filter string will be removed from the results list. Here are some simple examples:
['-] Eliminate all apostrophized and hyphenated words from the results.
.{7} Eliminate all words of seven or more characters from the results.
ly$ Eliminate any word ending in ly from the results.
^p?re Eliminate any word beginning in either pre or re from the results.

For the next two examples we'll leave the String field empty, which matches the entire dictionary.
$v Eliminate all words that have vowels. dry, crypt, wryly, flyby, etc.
[bcdfghjklmnpqrstvwxz] Eliminate all words that have lower-case consonants (not counting y as a consonant). eye, May, yo-yo, you, etc.

Top | Regex Dictionary


visca.com | Regex Dictionary

Email