Postal Code Filter Pattern Matching

From AbleCommerce Wiki
Revision as of 17:43, 12 July 2009 by Loganr (Talk | contribs)

Jump to: navigation, search

When you define a zone in AbleCommerce, you have the option of setting a postal code filter. This would allow you to define a very specific area, perhaps to target a specific city. In early versions of AbleCommerce, the pattern matching is very simple. There is a single wildcard character * that can be used to match any value. For example 1234* would match 12340 through 12349.

Unfortunately, this wildcard matching proved to be too simple for some scenarios, especially with non-US postal codes. In response, support for a new style of postal code filter was added in AC7.0.1.

The descriptions and examples below will attempt to give a very basic introduction to using this pattern matching. If the explanations seem like gibberish, do not feel alarmed. Configuring regular expressions can be very complex, even for technically savvy people. If you need assistance, our community forum should be a helpful resource.

Common Pattern Elements

Element Description and Example
@ This symbol is not actually a regular expression element, but it is used by

AbleCommerce to signal that the postal code filter uses the new pattern matching syntax. All patterns must begin with the @ symbol to activate the regular expression match.

| The vertical bar separates alternate values that should match.

@12345|12347 would match either 12345 or 12347.

( ) Parenthesis can be used to group elements. @12345(-0001) indicates that

-0001 is a group.

 ? The question mark means that the preceding element (or group of elements)

appears zero or one time.

@12345(-0001)? indicates that the -0001 may or may not appear. In other words, the patterm would match either 12345 or 12345-0001.

* The asterisk means that the preceding element (or group of elements) appears

zero or more times. This will not likely be useful in postal code matching, but is mentioned here because the meaning differs from the simple wildcard usage in postal code filters. @1234* would match 123, 1234, 12344, 123444, 1234444, ... and so on.

+ The plus means that the preceeding element appears one or more

times. Again, this is not likely to be useful in postal code matches. @1234+ would match 1234, 12344, 123444, 1234444, ... and so on.

[ ] Square brackets are used to indicate a character set. All

characters within are considered to be a match. Also, a hyphen can be used to indicate a range of characters to match. @[abc] matches a or b or c. @[a-c] also matches a or b or c, the hyphen indicates the range from a to c.

[^] A square bracket can also be used to indicate characters that should NOT be

matched, if the first character within the bracket is a caret. @[^abc] would match any character except a, b, or c. @[^a-c] would also match any character except a, b, or c using the hyphen to indicate the character range.

{ } Curly brackets can be used to indicate a number of times the preceding element

appears. @A{3} indicates the A appears 3 times, so it would would match the value AAA @A{3,5} indicates the A appears at least 3 times and at most 5 times, so it would match the values AAA, AAAA, or AAAAA, but not AA or AAAAAA. @A{3,} indicates the A appears at least 3 times, so it would match the values AAA, AAAA, AAAAA, AAAAAA, and so on.

\d This shorthand character can be used to indicate any digit 0 through

9. It is equivalent to [0-9]. @\d{5} would match any series of 5 digits (like a standard US ZIP code)

^ and $ The caret and dollar sign have special meanings if placed at the beginning and

end of a pattern. They indicate that the value must begin and/or end with the pattern. ^[0-9] matches 0ABC but not ABC0 because the pattern indicates the value must start with a digit. [0-9]$ matches ABC0 but not 0ABC because the pattern indicates the value must end with a digit. ^[0-9]$ matches any single character 0 through 9, but will not match any other value because the pattern indicates the beginning and end of the value and the only valid match is a single digit.

Now let’s put the elements above together into some real world examples.

Schenectady, New York, USA

Suppose we are a merchant based in Schenectady, NY. We wish to create a zone to cover the city so that we can offer a special shipping method for customer pickup. We can use a regular expression pattern to do this:

Pattern: @^1234[0-9]$

Explanation: We used the square brackets to match a character range. This matches any value 12340 through 12349, which could be used to partially target the city of Schenectady, NY. Notice we have also used beginning and ending indicators so that we do not inadvertently accept the value 123456. Unfortunately, we have not yet recognized the values 12008, 12325, and 12345 that are also within city limits.

Pattern: @^1234[0-9]|12008|12325|12345$

Explanation: We combined the previous expression with the other ZIP codes using the vertical bar. This now matches all 5 digit ZIP codes assigned to Schenectady, NY. But wait… what if the customer puts in their plus 4 code? This pattern will not match the value 12345-0001.

Pattern: @^(1234[0-9]|12008|12325|12345)(-\d{4})?$

Explanation: We grouped the 5 digit ZIP codes with parenthesis. Then we included a second grouping. The grouping begins with a hyphen, meaning we expect the 5 digit zip code to have a hyphen before the plus 4. Then we have the \d element that matches any digit. Then we have the {4} quantifier to indicate there should be exactly 4 digits. The whole group is followed by the ? character which means it may or may not appear.

With this pattern we match the following values:

12345 12345-6789

But we do not match these values:

12350 123456789 12345-67890

Whistler, British Columbia, Canada

The regular expression pattern matching is particularly useful in non-US postal code matching because you can match specific sets of characters. Canada is notable because postal codes incorporate alphabetic as well as numeric values. Now suppose we wish to create a zone that specifically targets Whistler, BC.

Pattern:@^[A-Z][0-9][A-Z] [0-9][A-Z] [0-9]$

Explanation: This is a very basic example of a pattern that matches a Canadian postal code. The pattern is letter, digit, letter, digit, letter, digit. However it is not specific enough for our purpose.

Pattern: @^V0N1B[0-9]$

Explanation: This expression matches the range of postal codes from V0N1B0 through V0N1B9. That’s pretty good, but usually convention is to put a space in between the third and fourth character.

Pattern: @^V0N ?1B[0-9]$

Explanation: Now we have added an optional space, so a customer could enter V0N1B0 or V0N 1B0 and it would be a match. But we also want to include the Olympic Village in our zone, so we need to factor in one more postal code.

Pattern: @^V0N ?1B[0-9]|V0N ?2T0$

Explanation: Now our zone can identify all postal codes that belong to Whistler!

Multiple Patterns

All of the above examples used a single pattern. You can include multiple patterns in the postal code filter. A customer entry must match at least one of the patterns in order to be considered part of the zone. Separate multiple patterns with a semicolon, like so:

@pattern1;pattern2;pattern3

A more realistic example might be:

@^1234[1-3]$;^1235[4-6]$;^1236[7-9]$

This would match any of the following codes:

12341, 12342, 12343, 12354, 12355, 12356, 12367, 12368, 12369