REGEX
Regex is short for Regular Expression. It helps to match, find or manage text. Start by typing OK in the Regex field to proceed to the first step and access the more detailed description.
What is Regular Expressions Regex?
Regular Expressions are a string of characters that express a search pattern. Often abbreviated as Regex or Regexp. It is especially used to find or replace words in texts. In addition, we can test whether a text complies with the rules we set.
For example, let's say you have a list of filenames. And you only want to find
files with the pdf extension. Following typing an expression ^\w+\.pdf$
will
work. The meaning of the definitions in this expression will become clearer as
the steps progress.
Basic Matchers
The character or word we want to find is written directly. It is similar to a normal search process. For example, to find the word curious in the text, type the same.
The period .
allows selecting any character, including special characters and spaces.
Character Sets [abc]
If one of the characters in a word can be various characters, we write it in
square brackets []
with all alternative characters. For example, to write an
expression that can find all the words in the text, type the characters
a, e, i, o, u
adjacently within square brackets []
.
Negated Character Sets [^abc]
To find all words in the text below, except for ber and bor, type e and o side
by side after the caret ^
character inside square brackets []
.
Character Sets: Alphanumeric Range
Letter Range [a-z]
To find the letters in the specified range, the starting letter and the ending
letter are written in square brackets []
with a dash between them -
. It is
case-sensitive. Type the expression that will select all lowercase letters
between e
and o
, including themselves.
Character Sets: Digit Range
Number Range [0-9]
To find the numbers in the specified range, the starting number and the ending
number are written in square brackets []
with a dash -
between them. Write
an expression that will select all numbers between 3 and 6, including
themselves.
Repetitions
Some special characters are used to specify how many times a character will be
repeated in the text. These special characters are the plus +
, the asterisk
*
, and the question mark ?
.
Repetitions: Asterisk
Asterisk *
We put an asterisk *
after a character to indicate that the character may
either not match at all or can match many times. For example, indicate that the
letter e
should never occur in the text, or it can occur once or more side by
side.
Repetitions: The Plus
Plus Sign +
To indicate that a character can occur one or more times, we put a plus sign +
after a character. For example, indicate that the letter e
can occur one or
more times in the text.
Repetitions: The Question Mark
Question Mark ?
To indicate that a character is optional, we put a ?
question mark after a
character. For example, indicate that the following letter u
is optional.
Repetitions: Curly Braces
To express a certain number of occurrences of a character, we write curly braces
{n}
along with how many times we want it to occur at the end. For example,
indicate that the following letter e
can occur only 2 times.
To express at least a certain number of occurrences of a character, we write the
end of the character at least how many times we want it to occur, with a comma
,
at the end, and inside curly braces {n, }
. For example, indicate that the
following letter e
can occur at least 3 times.
To express the occurrence of a character in a certain number range, we write
curly braces {x,y}
with the interval we want to go to the end. For example,
indicate that the following letter e
can only occur between 1 and 3.
Grouping
Parentheses ( ): Grouping
We can group an expression and use these groups to reference or enforce some
rules. To group an expression, we enclose ()
in parentheses. For now just
group haa
below.
Group References
Referencing a Group
The words ha
and haa
are grouped below. The first group is used by writing
\1
to avoid rewriting. Here 1 denotes the order of grouping. Type \2
at the
end of the expression to refer to the second group.
Non-capturing Grouping
Parentheses (?: )
: Non-capturing Grouping
You can group an expression and ensure that it is not captured by references.
For example, below are two groups. However, the first group reference we denote
with \1
actually indicates the second group, as the first is a non-capturing
group.
Pipe Character |
It allows to specify that an expression can be in different expressions. Thus,
all possible statements are written separated by the pipe sign |
. This differs
from charset [abc]
, charsets operate at the character level. Alternatives are
at the expression level. For example, the following expression would select both
cat and Cat. Add another pipe sign |
to the end of the expression and type rat
so that all words are selected.
### Escape Character \
There are special characters that we use when writing regex.
{ } [ ] / \ + * . $^ | ?
Before we can select these characters themselves, we
need to use an escape character \
. For example, to select the dot . and
asterisk *
characters in the text, let's add an escape character \
before
it.
Start of The String
Caret Sign ^
:
Selecting by Line Start. We were using [0-9]
to find numbers. To find only
numbers at the beginning of a line, prefix this expression with the ^
sign.
End of The String
Dollar Sign $
:
Selecting by End of Line. Let's use the $
sign after the html value to find
the html texts only at the end of the line.
Alphanumeric
Word Character \w
: Letter, Number and Underscore
The expression \w
is used to find letters, numbers and underscore characters.
Let's use the expression \w
to find word characters in the text.
Non-alphanumeric
Except Word Character \W
The expression \W
is used to find characters other than letters, numbers, and
underscores.
Digits
Number Character \d
\d
is used to find only number characters.
Non-digits
Except Number Character \D
\D
is used to find non-numeric characters.
Whitespace Characters
Space Character \s
\s
is used to find only space characters.
Non-whitespace Characters
Except Space Character \S
\S
is used to find non-space characters.
Lookaround
If we want the phrase we're writing to come before or after another phrase, we need to "lookaround". Take the next step to learn how to "lookaround".
Lookaround: Positive Lookahead
Positive Lookahead: (?=)
For example, we want to select the hour value in the text. Therefore, to select
only the numerical values that have PM after them, we need to write the positive
look-ahead expression (?=)
after our expression. Include PM after the =
sign
inside the parentheses.
Lookaround: Negative Lookahead
Negative Lookahead: (?!)
For example, we want to select numbers other than the hour value in the text.
Therefore, we need to write the negative look-ahead (?!)
expression after our
expression to select only the numerical values that do not have PM after them.
Include PM after the !
sign inside the parentheses.
Lookaround: Positive Lookbehind
Positive Lookbehind: (?<=)
For example, we want to select the price value in the text. Therefore, to select
only the number values that preceded by $
, we need to write the positive
lookbehind expression (?<=)
before our expression. Add \$
after the =
sign
inside the brackets.
Lookaround: Negative Lookbehind
Negative Lookbehind: (?<!)
For example, we want to select numbers in the text other than the price value.
Therefore, to select only numeric values that are not preceded by $
, we need
to write the negative lookbehind (?<!)
before our expression. Add \$
after
the !
inside the brackets.
Flags
Flags change the output of the expression. That's why flags are also called modifiers. Determines whether the typed expression treats text as separate lines, is case sensitive, or finds all matches. Continue to the next step to learn the flags.
Flags: Global
The global flag causes the expression to select all matches. If not used it will only select the first match. Now enable the global flag to be able to select all matches.
/g all matches
Flags: Multiline
Regex sees all text as one line. But we use the multiline flag to handle each line separately. In this way, the expressions we write according to the end of the linework separately for each line. Now enable the multiline flag to find all matches.
/m multiline
Flags: Case Insensitive
In order to remove the case-sensitiveness of the expression we have written, we must activate the case-insensitive flag.
/i case insensitive
Greedy Matching
Regex does a greedy match by default. This means that the matchmaking will be as
long as possible. Check out the example below. It refers to any match that ends
in r
and can be any character preceded by it. But it does not stop at the
first letter r
.
Lazy Matching
Lazy matchmaking, unlike greedy matching, stops at the first matching. For
example, in the example below, add a ?
after *
to find the first match that
ends with the letter r
and is preceded by any character. It means that this
match will stop at the first letter r
.