On 4/8/11 12:15 PM, James Lampert wrote:
I have a problem with phone number searching in a database.
The problem is that the phone numbers are not stored consistently.
Most U.S. phone numbers are stored with the punctuation squeezed out.
But there are a few that are stored including the punctuation.
Most non-U.S. phone numbers are stored "as-entered." But again,
there may be exceptions.
And we don't know which of these possibilities we're looking for
until after we find it.
Is there an SQL construct, that works back to V4R4, in which, if
we're given a raw value of, say, (714) 555-1212, we will find both
"(714) 555-1212" and "7145551212" in the file?
Might be worth adding a CHECK CONSTRAINT that prevents anything but
digits [e.g. using TRANSLATE()], modifying existing data so that only
thee digits remain in existing rows, and then fixing any programs that
attempt violating the constraint and thus possibly failing.?
If the assumption that the data is always three groups of digits
AAA, PPP, and NNNN holds, and if those digit-groups are stored in the
format AAAPPPNNNN with the only non-digit data inserted between or
around each group of digits stored in the column PHONENBR, then the
following search would locate both of the noted sample [or other
similarly formatted] phone number values. However such a generic
selection almost ensures a full-table-scan implementation:
select ...
WHERE PhoneNbr LIKE '%AAA%PPP%NNNN%'
That might be written instead, given variables A, P, and N that are
the character strings for each of the digit-groups, as:
select ...
WHERE PhoneNbr LIKE '%' concat :A
concat '%' concat :P
concat '%' concat :N concat '%'
Given there is only ever one possible prefix character and that
character must be the '(' left parenthesis character, then the following
is a better choice for likely enabling an index implementation for the
first key on PhoneNbr [per first three or four bytes for cardinality]:
select ...
WHERE PhoneNbr LIKE 'AAA%PPP%NNNN%'
OR PhoneNbr LIKE '(AAA%PPP%NNNN%'
/* OR PhoneNbr LIKE '(AAA) PPP%NNNN%' -- or first nine bytes? */
Given there are only the two specific sample formats of interest, and
those are left justified, then an even better selection for thee best
probability for use of an index on that key for the query implementation
would be:
select ...
WHERE PhoneNbr = 'AAAPPPNNNN'
OR PhoneNbr = '(AAA) PPP-NNNN'
/* OR PhoneNbr = '(AAA)PPP-NNNN' -- perhaps as a third value? */
/* although that is perhaps better written using IN predicate */
Regards, Chuck
As an Amazon Associate we earn from qualifying purchases.