JavaScript Kit > JavaScript Reference > Here
RegExp (regular expression) object
Regular expressions are a powerful tool for performing pattern matches in Strings in JavaScript. You can perform complex tasks that once required lengthy procedures with just a few lines of code using regular expressions. Regular expressions are implemented in JavaScript in two ways:
Literal syntax:
//match all 7 digit numbers
var phonenumber= /\d{7}/
Dynamically, with the RegExp()
constructor:
//match all 7 digit numbers (note how "\d" is
defined as "\\d")
var phonenumber=new RegExp("\\d{7}", "g")
A pattern defined inside RegExp()
should be enclosed in
quotes, with any special characters escaped to retain its meaning (ie: "\d
"
must be defined as "\\d
"). The RegExp()
method allows you to dynamically construct the search pattern as
a string, and is useful when the pattern is not known ahead of time.
Related Tutorials (highly recommended readings)
Pattern flags (switches)
Property | Description | Example |
---|---|---|
i | Ignore the case of characters. | /The/i matches "the" and "The" and "tHe" |
g | Global search for all occurrences of a pattern | /ain/g matches both "ain"s in "No pain no gain", instead of just the first. |
gi | Global search, ignore case. | /it/gi matches all "it"s in "It is our IT department" |
m | Multiline mode. Causes ^ to match beginning of line or beginning of string. Causes $ to match end of line or end of string. JavaScript1.5+ only. | /hip$/m matches "hip" as well as "hip\nhop" |
Position Matching
Symbol | Description | Example |
---|---|---|
^ | Only matches the beginning of a string. | /^The/ matches "The" in "The night" by not "In The Night" |
$ | Only matches the end of a string. | /and$/ matches "and" in "Land" but not "landing" |
\b | Matches any word boundary (test characters must exist at the beginning or end of a word within the string) | /ly\b/ matches "ly" in "This is really cool." |
\B | Matches any non-word boundary. | /\Bor/ matches “or” in "normal" but not "origami." |
(?=pattern) | A positive look ahead. Requires that the following pattern in within the input. Pattern is not included as part of the actual match. | /(?=Chapter)\d+/ matches any digits when it's proceeded by the words "Chapter", such as 2 in "Chapter 2", though not "I have 2 kids." |
(?!pattern) | A negative look ahead. Requires that the following pattern is not within the input. Pattern is not included as part of the actual match. | /JavaScript(?! Kit)/ matches any occurrence of the word "JavaScript" except when it's inside the phrase "JavaScript Kit" |
Literals
Symbol | Description |
---|---|
Alphanumeric | All alphabetical and numerical characters match themselves literally. So /2 days/ will match "2 days" inside a string. |
\O | Matches NUL character. |
\n | Matches a new line character |
\f | Matches a form feed character |
\r | Matches carriage return character |
\t | Matches a tab character |
\v | Matches a vertical tab character |
[\b] | Matches a backspace. |
\xxx |
Matches the ASCII character expressed by the
octal number xxx. "\50" matches left parentheses character "(" |
\xdd |
Matches the ASCII character expressed by the hex
number dd. "\x28" matches left parentheses character "(" |
\uxxxx |
Matches the ASCII character expressed by the
UNICODE xxxx. "\u00A3" matches "£". |
The backslash (\) is also used when you wish to match a special character literally. For example, if you wish to match the symbol "$" literally instead of have it signal the end of the string, backslash it: /\$/
Character Classes
Symbol | Description | Example |
---|---|---|
[xyz] | Match any one character enclosed in the character set. You may use a hyphen to denote range. For example. /[a-z]/ matches any letter in the alphabet, /[0-9]/ any single digit. | /[AN]BC/ matches "ABC" and "NBC" but not "BBC" since the leading “B” is not in the set. |
[^xyz] |
Match any one character not enclosed in the character set. The
caret indicates that none of the characters NOTE: the caret used within a character class is not to be confused with the caret that denotes the beginning of a string. Negation is only performed within the square brackets. |
/[^AN]BC/ matches "BBC" but not "ABC" or "NBC". |
. | (Dot). Match any character except newline or another Unicode line terminator. | /b.t/ matches "bat", "bit", "bet" and so on. |
\w | Match any alphanumeric character including the underscore. Equivalent to [a-zA-Z0-9_]. | /\w/ matches "200" in "200%" |
\W | Match any single non-word character. Equivalent to [^a-zA-Z0-9_]. | /\W/ matches "%" in "200%" |
\d | Match any single digit. Equivalent to [0-9]. | |
\D | Match any non-digit. Equivalent to [^0-9]. | /\D/ matches "No" in "No 342222" |
\s | Match any single space character. Equivalent to [ \t\r\n\v\f]. | |
\S | Match any single non-space character. Equivalent to [^ \t\r\n\v\f]. |
Repetition
Symbol | Description | Example |
---|---|---|
{x} | Match exactly x occurrences of a regular expression. | /\d{5}/ matches 5 digits. |
{x,} | Match x or more occurrences of a regular expression. | /\s{2,}/ matches at least 2 whitespace characters. |
{x,y} | Matches x to y number of occurrences of a regular expression. | /\d{2,4}/ matches at least 2 but no more than 4 digits. |
? |
Match zero or one occurrences. Equivalent to
{0,1}. "?" can also be used following one of the quantifiers
|
/a\s?b/ matches "ab" or "a b".
/\d{2,4}?/ matches "12" in the string "12345" instead of "1234" due to "?" at the end of the quantifier. |
* | Match zero or more occurrences. Equivalent to {0,}. | /we*/ matches "w" in "why" and "wee" in "between", but nothing in "bad" |
+ | Match one or more occurrences. Equivalent to {1,}. | /fe+d/ matches both "fed" and "feed" |
Alternation & Grouping
Symbol | Description | Example |
---|---|---|
( ) | Grouping characters together to create a clause. May be nested. | /(abc)+(def)/ matches one or more occurrences of "abc" followed by one occurrence of "def". |
( ) |
Apart from grouping characters (see above), parenthesis also serve
to capture the desired subpattern within a pattern. The values
of the subpatterns can then be retrieved using RegExp.$1 ,
RegExp.$2 etc after the pattern itself is matched
or compared. For example, the following matches "2 chapters" in
"We read 2 chapters in 3 days", and furthermore isolates the
value "2":var mystring="We read 2 chapters
in 3 days" The subpattern can also be back referenced later within the main pattern. See "Back References" below. |
The following finds the text "John Doe" and swaps their positions, so it
becomes "Doe John": "John Doe".replace(/(John) (Doe)/, "$2 $1") |
(?:x) | Matches x but does not capture it. In other words, no numbered references are created for the items within the parenthesis. |
/(?:.d){2}/ matches but doesn't capture "cdad". |
x(?=y) | Positive lookahead: Matches x only if it's followed by y. Note that y is not included as part of the match, acting only as a required conditon. |
/George(?= Bush)/ matches "George" in "George Bush" but not "George Michael" or
"George Orwell". /Java(?=Script|Hut)/ matches "Java" in "JavaScript" or "JavaHut" but not "JavaLand". |
x(?!y) | Negative lookahead: Matches x only if it's NOT followed by y. Note that y is not included as part of the match, acting only as a required condiiton. |
/^\d+(?! years)/ matches "5" in "5 days" or "5 oranges", but not "5 years".
|
| | Alternation combines clauses into one regular expression and then matches any of the individual clauses. Similar to "OR" statement. |
/forever|young/ matches "forever" or "young" /(ab)|(cd)|(ef)/ matches and remembers "ab" or "cd" or "ef". |
Back references
Symbol | Description |
---|---|
( )\n |
"\n" (where n is a number from 1 to 9) when added to the end of a
regular expression pattern allows you to back reference a
subpattern within the pattern, so the value of the subpattern is
remembered and used as part of the matching . A subpattern is
created by surrounding it with parenthesis within the pattern.
Think of "\n" as a dynamic variable that is replaced with the
value of the subpattern it references. For example: /(hubba)\1/ is equivalent to the pattern /hubbahubba/, as "\1" is replaced with the value of the first subpattern within the pattern, or (hubba), to form the final pattern. Lets say you want to match any word that occurs twice in a row, such as "hubba hubba." The expression to use would be: /(\w+)\s+\1/ "\1" is replaced with the value of the first subpattern's match to essentially mean "match any word, followed by a space, followed by the same word again". If there were more than one set of parentheses in the pattern string you would use \2 or \3 to match the desired subpattern based on the order of the left parenthesis for that subpattern. In the example: /(a (b (c)))/ "\1" references (a (b (c))), "\2" references (b (c)), and "\3" references (c). |
Regular Expression methods
Method | Description | Example |
---|---|---|
String.match(regular expression) | Executes a search for a match within a string
based on a regular expression. It returns an array of information or null if no match
is found.
Note: Also updates the $1…$9 properties in the RegExp object. |
var oldstring="Peter has
8 dollars and Jane has 15" newstring=oldstring.match(/\d+/g) //returns the array ["8","15"] |
RegExp.exec(string) | Similar to String.match() above in that it returns an array of information or null if no match is found. Unlike String.match() however, the parameter entered should be a string, not a regular expression pattern. | var match = /s(amp)le/i.exec("Sample text") //returns ["Sample","amp"] |
String.replace(regular expression, replacement text) | Searches and replaces the regular expression
portion (match) with the replaced text instead.
For the "replacement text" parameter, you can use the keywords $1 to $99 to
replace the original text with values from subpatterns defined within the
main pattern. The following finds the text "John Doe" and swaps their positions, so it becomes "Doe John": var newname="John Doe".replace(/(John) (Doe)/, "$2 $1") The following characters carry special meaning inside "replacement text":
The "replacement text" parameter can also be substituted with a callback function instead. See example below. |
var
oldstring="(304)434-5454" newstring=oldstring.replace(/[\(\)-]/g, "") //returns "3044345454" (removes "(", ")", and "-") |
String.split (string literal or regular expression) | Breaks up a string into an array of substrings based on a regular expression or fixed string. | var oldstring="1,2, 3,
4, 5" newstring=oldstring.split(/\s*,\s*/) //returns the array ["1","2","3","4","5"] |
String.search(regular expression) | Tests for a match in a string. It returns the index of the match, or -1 if not found. Does NOT support global searches (ie: "g" flag not supported). | "Amy and George".search(/george/i) //returns 8 |
RegExp.test(string) | Tests if the given string matches the Regexp, and returns true if matching, false if not. | var pattern=/george/i pattern.test("Amy and George") //retuns true |
Example- Replace "<", ">", "&" and quotes (" and ') with the equivalent HTML entity instead
function html2entities(sometext){
var re=/[(<>"'&]/g
arguments[i].value=sometext.replace(re, function(m){return
replacechar(m)})
}
function replacechar(match){
if (match=="<")
return "<"
else if (match==">")
return ">"
else if (match=="\"")
return """
else if (match=="'")
return "'"
else if (match=="&")
return "&"
}
html2entities(document.form.namefield.value) //replace "<", ">", "&" and quotes
in a form field with corresponding HTML entity instead
- JavaScript Operators
- JavaScript Statements
- Global functions
- JavaScript Events
- Escape Sequences
- Reserved Words