Re: SQL wildcarding -- MIDRANGE-L

On 4/8/11 12:15 PM, James Lampert wrote:

I have a problem with phone number searching in a database.

The problem is that the phone numbers are not stored consistently.

Most U.S. phone numbers are stored with the punctuation squeezed out.
But there are a few that are stored including the punctuation.

Most non-U.S. phone numbers are stored "as-entered." But again,
there may be exceptions.

And we don't know which of these possibilities we're looking for
until after we find it.

Is there an SQL construct, that works back to V4R4, in which, if
we're given a raw value of, say, (714) 555-1212, we will find both
"(714) 555-1212" and "7145551212" in the file?

Might be worth adding a CHECK CONSTRAINT that prevents anything but digits [e.g. using TRANSLATE()], modifying existing data so that only thee digits remain in existing rows, and then fixing any programs that attempt violating the constraint and thus possibly failing.?

If the assumption that the data is always three groups of digits AAA, PPP, and NNNN holds, and if those digit-groups are stored in the format AAAPPPNNNN with the only non-digit data inserted between or around each group of digits stored in the column PHONENBR, then the following search would locate both of the noted sample [or other similarly formatted] phone number values. However such a generic selection almost ensures a full-table-scan implementation:

select ...
WHERE PhoneNbr LIKE '%AAA%PPP%NNNN%'

That might be written instead, given variables A, P, and N that are the character strings for each of the digit-groups, as:

select ...
WHERE PhoneNbr LIKE '%' concat :A
concat '%' concat :P
concat '%' concat :N concat '%'

Given there is only ever one possible prefix character and that character must be the '(' left parenthesis character, then the following is a better choice for likely enabling an index implementation for the first key on PhoneNbr [per first three or four bytes for cardinality]:

select ...
WHERE PhoneNbr LIKE 'AAA%PPP%NNNN%'
OR PhoneNbr LIKE '(AAA%PPP%NNNN%'
/* OR PhoneNbr LIKE '(AAA) PPP%NNNN%' -- or first nine bytes? */

Given there are only the two specific sample formats of interest, and those are left justified, then an even better selection for thee best probability for use of an index on that key for the query implementation would be:

select ...
WHERE PhoneNbr = 'AAAPPPNNNN'
OR PhoneNbr = '(AAA) PPP-NNNN'
/* OR PhoneNbr = '(AAA)PPP-NNNN' -- perhaps as a third value? */
/* although that is perhaps better written using IN predicate */

Regards, Chuck

This mailing list archive is Copyright 1997-2025 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.