September 1999

Using JavaScript Regular Expressions for Data Validation

In our article, "Data validation tips and techniques," in the July 1999 issue, we offered some simple, re-usable functions that relied on String methods to test user-entered text data. Although we presented the more common approach to data validation in that article, there is another way, which is more efficient and also more complex. The alternate approach uses JavaScript regular expressions, a sequence of characters and symbols to define a pattern of text, which then can be used to test the contents of any text string. To demonstrate just a fraction of the value and power of regular expressions, we'll use them this month to rewrite the data validation code that appeared in July. As a sample application, we'll use the same Web page that we created for the July issue, except that we've added two new text fields to validate: Email and Phone. The page we're using is shown in Figure A.

Figure A: We'll re-use the Web page we created for our July issue to illustrate using regular expressions for data validation.

[ Figure A ]

Regular expressions: a brief introduction

A JavaScript regular expression is created by surrounding a text pattern with forward slashes. As a simple example, suppose you wanted to find if the word top was contained in a text string. We could use a regular expression that looked like this:

var myExp = /top/

Assigning the expression to a variable turns the pattern into a Regular Expression object, making available all of the Regular Expression object's properties and methods. One particularly useful method is the test method, which is used to test if the pattern in the regular expression matches a text string. For example, the regular expression myExp that we just created can test if its pattern is in a given text string with a statement like:

var found = myExp.test("Stop at the 
=>big top")

Given the above statement, the variable found becomes true, since the regular expression would have found the first occurrence of the phrase top in the string Stop.

But suppose that what we really wanted to find was the actual word top, rather than just a part of the word Stop? We can modify our regular expression using metacharacters that can further define the pattern. For example,

var myExp = /\btop/

places the metacharacter \b, which matches a word boundary, in front of the string, so only the word top, preceded by a word boundary, will be matched.

Some metacharacters also determine how many times a matching character or symbol should occur. For example, to match a number with three digits, the pattern would look like this, where the \d stands for a single digit:

var myExp = /\d{3}/

Other metacharacters can specify whether the regular expression should find all occurrences of a pattern (g), or both lower-case and upper-case occurrences (i). Or, they can be used to specify where in the string the pattern should be found. A list of regular expression metacharacters and their descriptions appears in Table A. In our validation application, we'll show you examples of how many of these metacharacters can be used to form regular expressions.

Table A: Regular expression metacharacters

Character Matches

\b Word boundary

\B Word nonboundary

\d Numeral 0 through 9

\D Non-numeral

\s Single white space

\S Single nonwhite space

\w Letter, numeral or underscore

\W Not letter, numeral or underscore

. Any character except newline

[...] Character set

[^...] Negated character set

* Last character occurring zero or more times

? Last character occurring zero or one time

+ Last character occurring one or more times

{n} Last character occurring exactly n times

{n,} Last character occurring n or more times

{n,m} Last character occurring at least n, at most m times

^ Pattern occurring at the beginning of the string

$ Character occurring at the end of the string

g All occurrences of the pattern; that is, (g)lobal

i Case (i)nsensitive occurrences of the pattern

| OR operator

Character	Matches
\b	Word boundary
\B	Word nonboundary
\d	Numeral 0 through 9
\D	Non-numeral
\s	Single white space
\S	Single nonwhite space
\w	Letter, numeral or underscore
\W	Not letter, numeral or underscore
.	Any character except newline
[...]	Character set
[^...]	Negated character set
*	Last character occurring zero or more times
?	Last character occurring zero or one time
+	Last character occurring one or more times
{n}	Last character occurring exactly n times
{n,}	Last character occurring n or more times
{n,m}	Last character occurring at least n, at most m times
^	Pattern occurring at the beginning of the string
$	Character occurring at the end of the string
g	All occurrences of the pattern; that is, (g)lobal
i	Case (i)nsensitive occurrences of the pattern
\|	OR operator

Note: To include a character in a pattern that could be interpreted as a metacharacter, use the escape symbol (\) in front of the character. For example, the regular expression /\d{3}\.\d{2} specifies a pattern of three digits, followed by a period, followed by two digits. We've used the escape symbol in front of the period (.) because we want to represent an actual period, not the metacharacter (.).

Regular expressions for data validation

With this very basic introduction in mind, let's look at some examples of regular expressions that we can use for validating and reformatting user-entered data. We'll then use the expressions to rewrite the validation functions we developed in our July issue, and add two new functions to our repertoire as well.

Null and blank strings

With regular expressions, we can rewrite the notBlank and notNull functions we used in the July article to more specifically address the needs of our data. For example, if we require only that the text box not be left blank, we can use the regular expression:

var charexp = /./

The period will find a match as long as the text box contains any character at all, except the newline character. However, we can also be more specific. To test whether a text box contains at least one alphabetical character, a-z, the regular expression would be:

var letterexp = /[a-z]/i

The pattern contains the character set of any letter from a to z, so any letter would serve as a match. The i at the end of the expression specifies that either an upper-case or lower-case letter is acceptable.

Member number

Our membership number is required to have three digits (0-999) and we don't want any other extraneous characters before or after. The regular expression to represent this pattern would be:

var memberexp = /^\d{3}$/

The pattern specifies that, starting at the beginning of the string (^) there should be exactly three digits (\d{3}), at which the string should end ($).

ZIP code

This one is just a little bit trickier, because we have the option of either a five-digit or a nine-digit ZIP code. Here's the regular expression we can use:

var zipexp = /^\d{5}$|^\d{5}[\-
=>\s]?\d{4}$/

We'll take advantage of the OR (|) operator so that we can test for either 5 or 9 digits in one expression. To the left of the |, starting from the beginning of the string (^) to the end ($), we want 5 and only 5 digits (\d{5}). Or, (now to the right half of the expression, after the |), we want 5 digits, followed by zero or one (?) dashes (\-) or spaces(\s), followed by 4 more digits (\d{4}) which is where the string should end ($).

Placing the dash and the space in brackets ([]) indicates that they form a character set, either of which is a match. Note that the dash needs to have a backward slash (escape character) in front of it so that the expression knows that it's actually a dash and not a metacharacter. The question mark (?), indicating zero or one occurrences, is placed immediately after the character(s) it modifies (in our case, the character set of dash or space).

Phone

Validating a phone number is simpler, especially if we simplify the problem by first stripping off any extraneous characters. We'll get to reformatting strings in a minute. Here's the regular expression:

var phonexp =  /^\d{10}$/

Since we're assuming that we've removed all but the digits, we can expect 10 of them, including the area code. (This is also assuming US phone numbers only; international numbers add an additional level of complexity that we won't address here.)

Email addresses

Now we get a bit more complicated, since email can take many different forms. Here's the expression:

var emailexp = /^[a-z][a-z_0-9\.]+@[a-z_
=>0-9\.]+\.[a-z]{3}$/i

We're expecting that a valid email address begins with a letter (^[a-z]), is followed by one or more letters, numbers, underscores, or periods ([a-z0-9_\.]+), followed by an @ sign, followed by one or more letters, digits, underscores, or periods ([a-z0-9_\.]+). Then comes a period (\.), followed by three letters ([a-z]{3}), as in .com, .gov, or .edu, at which point the string ends ($). Note that again, the period had to be escaped, and the plus sign (+) indicates one or more occurrences of a character in the character set that precedes the (+).

Pledge amount

Checking to be sure that we have a valid pledge amount isn't as complicated, since we can again use our formatting functions to limit the string we're looking at. Assuming that we'll have a string comprised only of digits and a possible decimal point, our expression looks like this:

var pledgexp = /^\d*$|^\d*\.\d{2}$/

The pattern specifies 0 or more (\d*) digits, OR (|) zero or more digits followed by a decimal point (\d*\.), followed by exactly 2 digits ({2}). You might think that if we're allowing 0 digits, we should be worried about an empty string, but we can take care of that on the front end, before we reformat the string. Note that our regular expression also allows for a pledge amount of 0 dollars and some amount of cents, as a concession to the radio station's less generous contributors.

New reformatting functions

In our July issue, we wrote the code for the functions stripChars and stripNonDigits for reformatting a text string before we tried to validate it. Now that we have a basic understanding of regular expressions, we can use the String object's replace method to rewrite both functions with just one line of code. The replace method requires two parameters: a regular expression used for searching a string for a match and a string to replace the match that was found. When called, it replaces the first occurrence of the search string with the replacement string, and returns the new string. To replace all occurrences of the string, you need to add the g metacharacter to the regular expression.

A revised stripChars function

Recall that the function stripChars accepts as parameters the characters to be removed from a text string, and the original string. We can instead pass a regular expression to find the characters we want to remove. For example, to strip the dollar sign from our pledge amount we could call stripChars,passing the regular expression and the text box string, with the statement:

newpledge = stripChars(/\$/,
=>form.pledge.value)

This would mean that stripChars now requires just one line of code:

function stripChars(pattern, str) {
	return str.replace(pattern,")
}

The function accepts the regular expression into pattern, and the string from the text box into str. Using the replace method, it replaces any occurrences of the pattern found in the string (in this case, a dollar sign) with a null string (that is, with nothing), and returns the resulting string.

A revised stripNonDigits function

Rewriting the stripNonDigits function is just as simple. In our example, we use stripNonDigits to strip any extraneous characters from the phone number that the user has entered. Our call to the function only passes the string we want stripped of any non-digits:

newphone = stripNonDigits
=>(form.phone.value)

The function replaces all occurrences of any character that's not 0-9 with a null string using the negated character set ([^0-9]), and the global metacharacter (g):

function stripNonDigits(str) {
	return str.replace(/[^0-9]/g,")
}

A generic validation function

Now that we've defined regular expressions for each of our text box values, we can create a generic validation function that's passed a text box field and the regular expression we created to validate it. To actually test for a match between the regular expression and a string, we can use the Regular Expression object's test method. Once we've passed the regular expression into a parameter (let's call it pattern) and the string we want to test into a second parameter (say, str) our validation function consists of one statement:

function isValid(pattern, str) {
	return pattern.test(str)
}

where the test function will check to see if the string it was passed matches the pattern it was passed.

Putting it all together (again)

The full code for our regular expression validation script is found in Listing A. For completeness, the HTML code for the page is in Listing B.

Listing A: Script for data validation with regular expressions

<SCRIPT>
<!--
	
var charexp = /./
var letterexp = /[a-z]/i
var phonexp =  /^\d{10}$/
var memberexp = /^\d{3}$/
var zipexp = /^\d{5}$|^\d{5}[\-\s]?\d{4}$/
var emailexp = /^[a-z][a-z_0-9\.]+@[a-z_0-9\.]+\.[a-z]{3}$/i
var pledgexp = /^\d*$|^\d*\.\d{2}$/

function isValid(pattern, str) {
	return pattern.test(str)
}

function hasLetter(str) {
	return letterexp.test(str)
}

function hasChar(str) {
	return charexp.test(str)
}

function stripChars(pattern, str) {
	return str.replace(pattern,")
}

function stripNonDigits(str) {
	return str.replace(/[^0-9]/g,")
}

function checkform(form) {

	//Check the first name text box for an entry
	if  (!hasLetter(form.firstname.value)) {
		alert("Invalid first name")
		form.firstname.focus()
		return false
	}

	//Check the last name text box for an entry
	if (!hasLetter(form.lastname.value)) {
		alert("Invalid last name")
		form.lastname.focus()
		return false
	}

	//Check that the member number entry is valid
	if (!isValid(memberexp,form.membernum.value)) {
		alert("Invalid member number")
		form.membernum.focus()
		return false
	}

	//Check that the ZIP code entry is valid.
	if (!isValid(zipexp,form.zip.value)) {
		alert("Invalid ZIP code")
		form.zip.focus()
		return false
	}

	//Check that the email entry is valid
	if (!isValid(emailexp,form.email.value)) {
		alert("Invalid email")
		form.email.focus()
		return false
	}

	
	//Check that the phone entry is valid by first 
	//stripping off all nondigits.
	newphone = "
	if (hasChar(form.phone.value)) {
		newphone = stripNonDigits(form.phone.value)
		notvalid = !isValid(phonexp,newphone)
	}
	if (newphone == " || notvalid) {
	alert("Invalid phone number - include area code")
		form.phone.focus()
		return false
	}

	//Check that the phone entry is valid by first 
	//stripping off all any dollar sign using stripChars.
	newpledge = "
	if (hasChar(form.pledge.value)) {
		newpledge = stripChars(/\$/,form.pledge.value)
		notvalid = !isValid(pledgexp,newpledge)
	}
	if (newpledge == " || notvalid) {
		alert("Please enter a valid dollar amount")
		form.pledge.focus()
		return false
	}
	
	alert("Data valid")
}

//-->
</SCRIPT>

Listing the regular expressions

The first thing you'll notice about our new script is that the code is a lot shorter. In fact, we've reduced one level of function calls.

Next, notice that we've placed all of our regular expressions at the head of our script. We did this to make them easy to see, and also easy to change if you want to revise them later. Let's now take a look at the checkform function that's called by the Submit button, and which runs the whole show.

Validating name fields with hasLetter

The first thing checkform does is call the hasLetter function, passing the strings in both the first name and last name text boxes. hasLetter checks that each string contains at least one alphabetical character. For each text box that checkform tests, just as we did in our July issue, if the string comes back as not valid, the user is given a message, focus is returned to the text box, and the function returns a false to the Submit button call.

Validating fields with isValid

After checking the name fields, checkform calls the isValid function for the member number, email, and ZIP code entries, passing up their regular expressions along with the strings in each of the text boxes. Refer back to the list of regular expressions at the top of the listing to see what's being passed to isValid each time.

Reformatting phone and pledge fields

Finally, checkform calls stripNonDigits to remove extraneous characters from the phone entry, and calls stripChars to remove the dollar sign ($) from a pledge amount. The logic here was a little trickier. For both the phone and the pledge entries, we start by setting a string value to null. We then check that the fields contain at least one character by calling the function hasChar. If they do, we assign the reformatted string to the string variable we had originally set to null. Then, we call isValid with the reformatted string. If the string is still null (that is, the field must have been null because the string was never reset) or if isValid found that the string wasn't valid, we alert the user and return false to the call.

Listing B: HTML code for our data validation page

<HTML>
<HEAD>
<STYLE>
P {font-size:18}
H1 {text-align:center}
b {text-align:center;font-size:20}
TEXTAREA {font-weight:900}
</STYLE>

<! Script goes here

</HEAD>
<BODY>
<P style = "text-align:center;font-size:24">
	Pinebarren Public Radio
<P style = "text-align:center;font-size:20;" >
	Pledge Form
<FORM>
<TABLE border="0" width="450" cols = 4>
<TR>
<TD><i>First Name</i></TD>
<TD><i>Last Name</i></TD>
<TD><i>Member#</i></TD>
</TR>
<TR>
<TD><INPUT type="text" name="firstname" size="20"></TD>
<TD><INPUT type="text" name="lastname" size="20"> </TD>
<TD><INPUT type="text" name="membernum" size="15"></TD>
</TR>
<TR>
<TD><I>Address</I></TD>
<TD><I>City</I></TD>
<TD><I>State</I></TD>
<TD><I>ZIP Code</I></TD>
</TR>
<TR>
<TD><INPUT type="text" name="address" size="20"></TD>
<TD><INPUT type="text" name="city" size="20"></TD>
<TD><INPUT type="text" name="state" size="3" ></TD>
<TD><INPUT type="text" name="zip" size="12" ></TD>
</TR>
<TR>
<TD><i>Email</i></TD>
<TD><i>Phone</i></TD>
</TR>
<TR>
<TD><INPUT type="text" name="email" size="20"></TD>
<TD><INPUT type="text" name="phone" size="20"></TD>
</TR>
</TABLE>
<BR>
<BR>
<TABLE>
<TR><TD><i>Pledge Amount</i> </TD>
<TD> <input type="text" name="pledge" size="20"></TD>
</TR>
</TABLE>
<BR>
<INPUT type = "button"  value = "Submit" 
onclick = "return checkform(this.form)">
<INPUT type = "reset"  value = "Reset">
</FORM>
</CENTER>
</BODY>
</HTML>

Conclusion

In this article, we gave you a very brief overview of regular expressions, and showed you how they could be used to implement data validation functions. You should know, however, that we've only barely provided a glimpse into both the power and the complexity of regular expressions, and the many ways they can be put to use. For more information, pick up Jeffrey Friedl's book, Mastering Regular Expressions: Powerful Techniques for Perl and Other Tools.

Copyright © 1999, ZD Inc. All rights reserved. ZD Journals and the ZD Journals logo are trademarks of ZD Inc. Reproduction in whole or in part in any form or medium without express written permission of ZD Inc. is prohibited. All other product names and logos are trademarks or registered trademarks of their respective owners.