On 4/8/11 2:05 PM, CRPence wrote:
On 4/8/11 12:15 PM, James Lampert wrote:
I have a problem with phone number searching in a database.
The problem is that the phone numbers are not stored consistently.
Most U.S. phone numbers are stored with the punctuation squeezed out.
But there are a few that are stored including the punctuation.
Most non-U.S. phone numbers are stored "as-entered." But again,
there may be exceptions.
And we don't know which of these possibilities we're looking for
until after we find it.
Is there an SQL construct, that works back to V4R4, in which, if
we're given a raw value of, say, (714) 555-1212, we will find both
"(714) 555-1212" and "7145551212" in the file?
<<SNIP>>
If the assumption that the data is always three groups of digits AAA,
PPP, and NNNN holds, and if those digit-groups are stored in the format
AAAPPPNNNN with the only non-digit data inserted between or around each
group of digits stored in the column PHONENBR, then the following search
would locate both of the noted sample [or other similarly formatted]
phone number values. However such a generic selection almost ensures a
full-table-scan implementation:
select ...
WHERE PhoneNbr LIKE '%AAA%PPP%NNNN%'
That might be written instead, given variables A, P, and N that are the
character strings for each of the digit-groups, as:
select ...
WHERE PhoneNbr LIKE '%' concat :A
concat '%' concat :P
concat '%' concat :N concat '%'
<<SNIP>>
FWiW, something of the following might be of interest or useful.
An expression defining the value for the LIKE predicate could
probably be generated from a similarly "inconsistent" input value, or
even better, from a well-formed\consistent input value. Consider the
following scripted actions and results [where the sample input is the
literal '(714):555[1212]-' which is followed by the same example
specifying the variable :P]. Replacing all digits to a common character
like with some typical formatting characters, then grouping the results
with an optional COUNT(*) might be of interest to review for the most
common formats stored in the data; maybe to choose into what format an
update trigger might best effect to minimize changes.
<code>
> drop table qtemp/p
Drop of P in QTEMP complete.
> create table qtemp/p (pn char(18))
Table P created in QTEMP.
> insert into qtemp/p values
('(714) 555-1212') ,('(714)555-1212') ,('(714) 5551212')
,('714 555 1212' ) ,('714-555-1212' ) ,('7145551212' )
,('[714](555)1212') ,('a714\\p555n1212x34'),('714: 555.1212 x:44')
9 rows inserted in P in QTEMP.
> select pn,replace( translate( pn,' ','[]()-.:_<>'), ' ', '%')
from qtemp/p where pn like
replace( translate( '(714):555[1212]-',' ','[]()-.:_<>')
, ' ', '%')
-- or in a program
> select pn,replace( translate( pn,' ','[]()-.:_<>'), ' ', '%')
from qtemp/p where pn like
replace( translate( :P,' ','[]()-.:_<>'), ' ', '%')
....+....1....+....2....+....3....+...
PN REPLACE
(714) 555-1212 %714%%555%1212%%%%
(714)555-1212 %714%555%1212%%%%%
(714) 5551212 %714%%5551212%%%%%
714 555 1212 714%555%1212%%%%%%
714-555-1212 714%555%1212%%%%%%
7145551212 7145551212%%%%%%%%
[714](555)1212 %714%%555%1212%%%%
a714\\p555n1212x34 a714\\p555n1212x34
714: 555.1212 x:44 714%%555%1212%x%44
******** End of data ********
> select pn,
from qtemp/p where pn like
replace( translate( '(714):555[1212]-',' ','[]()-.:_<>')
, ' ', '%')
-- or in a program
> select pn,replace( translate( pn,' ','[]()-.:_<>'), ' ', '' )
from qtemp/p where pn like
replace( translate( :P,' ','[]()-.:_<>'), ' ', '%')
....+....1....+....2....+....3....+...
PN REPLACE
(714) 555-1212 7145551212
(714)555-1212 7145551212
(714) 5551212 7145551212
714 555 1212 7145551212
714-555-1212 7145551212
7145551212 7145551212
[714](555)1212 7145551212
a714\\p555n1212x34 a714\\p555n1212x34
714: 555.1212 x:44 7145551212x44
******** End of data ********
</code>
Or because the expression defining the value for the predicate should
be evaluated just once for the query, a UDF [or even several] to effect
some of the /better/ searches described in [or as can be inferred from]
prior examples is possible. For example a function which strips out the
non-digit characters can be used to recombine each of the digit groups
with the zero-to-many percent sign search character.
<code>
-- Instead of:
select ...
WHERE PhoneNbr LIKE 'AAA%PPP%NNNN%'
OR PhoneNbr LIKE '(AAA%PPP%NNNN%'
-- Perhaps:
select ...
WHERE PhoneNbr LIKE BldLikeAPN(:P) /* Gen: 'AAA%PPP%NNNN%' */
OR PhoneNbr LIKE '(' concat BldLikeAPN(:P)
</code>
Or like the last example in the quoted message... Outside of SQL or
using SQL, set the variables AAA, PPP, and NNNN, to the digit-groups
[strings] of the digits AAAPPPNNNN and then perhaps the following:
<code>
create table qtemp/pn (pn char(18))
Exec SQL
insert into qtemp/pn
select pn
,replace( translate( pn,' ', '[]()-.:_<>'), ' ', '' ) px
from p
where
pn = :P /* Exact Match to the value typed by user */
-- :P 20A inz('(714)-555-1212')
OR pn like replace( translate( :P,' ', '[]()-.:_<>'), ' ', '')
concat '%'
OR pn like :aaa concat '%' concat :ppp
concat '%' concat :nnnn concat '%'
OR pn like '(' concat :aaa concat '%' concat :ppp
concat '%' concat :nnnn concat '%'
;
</code>
Or perhaps [aiming for "prettier" SQL with more and more complex
tests] move all of the common possible formatting options, for both
EQuivalence and LiKe, into UDFs; being sure to avoid a replacement
character [percent or underscore] as the first byte in hopes of enabling
the implementation by an index:
where
pn = :EM /* Exact Match value typed by user */
OR pn = PhoneStyleUSAEQ(:EM) /* (123) 456-7890 */
OR pn = PhoneStyleUS1EQ(:EM) /* (123)456-7890 */
OR pn = PhoneStyleUS2EQ(:EM) /* 123-456-7890 */
OR pn like PhStyleDIGLK(:EM) /* 1234567890% */
OR pn like PhStyleUS1LK(:EM) /* (123%456%7890% */
OR pn like PhStyleUS2LK(:EM) /* 123%456%7890% */
where
pn IN ( :EM /* Exact Match value typed by user */
, PhoneStyleUSAEQ(:EM) /* (123) 456-7890 */
, PhoneStyleUS1EQ(:EM) /* (123)456-7890 */
, PhoneStyleUS2EQ(:EM) /* 123-456-7890 */
)
OR pn like PhoneStyleDIGLK(:EM) /* 1234567890% */
OR pn like PhoneStyleUS1LK(:EM) /* (123%456%7890% */
OR pn like PhoneStyleUS2LK(:EM) /* 123%456%7890% */
</code>
As an Amazon Associate we earn from qualifying purchases.