× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



On 4/8/11 2:05 PM, CRPence wrote:
On 4/8/11 12:15 PM, James Lampert wrote:
I have a problem with phone number searching in a database.

The problem is that the phone numbers are not stored consistently.

Most U.S. phone numbers are stored with the punctuation squeezed out.
But there are a few that are stored including the punctuation.

Most non-U.S. phone numbers are stored "as-entered." But again,
there may be exceptions.

And we don't know which of these possibilities we're looking for
until after we find it.

Is there an SQL construct, that works back to V4R4, in which, if
we're given a raw value of, say, (714) 555-1212, we will find both
"(714) 555-1212" and "7145551212" in the file?


<<SNIP>>

If the assumption that the data is always three groups of digits AAA,
PPP, and NNNN holds, and if those digit-groups are stored in the format
AAAPPPNNNN with the only non-digit data inserted between or around each
group of digits stored in the column PHONENBR, then the following search
would locate both of the noted sample [or other similarly formatted]
phone number values. However such a generic selection almost ensures a
full-table-scan implementation:

select ...
WHERE PhoneNbr LIKE '%AAA%PPP%NNNN%'

That might be written instead, given variables A, P, and N that are the
character strings for each of the digit-groups, as:

select ...
WHERE PhoneNbr LIKE '%' concat :A
concat '%' concat :P
concat '%' concat :N concat '%'

<<SNIP>>


FWiW, something of the following might be of interest or useful.

An expression defining the value for the LIKE predicate could probably be generated from a similarly "inconsistent" input value, or even better, from a well-formed\consistent input value. Consider the following scripted actions and results [where the sample input is the literal '(714):555[1212]-' which is followed by the same example specifying the variable :P]. Replacing all digits to a common character like with some typical formatting characters, then grouping the results with an optional COUNT(*) might be of interest to review for the most common formats stored in the data; maybe to choose into what format an update trigger might best effect to minimize changes.

<code>

> drop table qtemp/p
Drop of P in QTEMP complete.
> create table qtemp/p (pn char(18))
Table P created in QTEMP.
> insert into qtemp/p values
('(714) 555-1212') ,('(714)555-1212') ,('(714) 5551212')
,('714 555 1212' ) ,('714-555-1212' ) ,('7145551212' )
,('[714](555)1212') ,('a714\\p555n1212x34'),('714: 555.1212 x:44')
9 rows inserted in P in QTEMP.


> select pn,replace( translate( pn,' ','[]()-.:_<>'), ' ', '%')
from qtemp/p where pn like
replace( translate( '(714):555[1212]-',' ','[]()-.:_<>')
, ' ', '%')
-- or in a program
> select pn,replace( translate( pn,' ','[]()-.:_<>'), ' ', '%')
from qtemp/p where pn like
replace( translate( :P,' ','[]()-.:_<>'), ' ', '%')


....+....1....+....2....+....3....+...
PN REPLACE
(714) 555-1212 %714%%555%1212%%%%
(714)555-1212 %714%555%1212%%%%%
(714) 5551212 %714%%5551212%%%%%
714 555 1212 714%555%1212%%%%%%
714-555-1212 714%555%1212%%%%%%
7145551212 7145551212%%%%%%%%
[714](555)1212 %714%%555%1212%%%%
a714\\p555n1212x34 a714\\p555n1212x34
714: 555.1212 x:44 714%%555%1212%x%44
******** End of data ********

> select pn,
from qtemp/p where pn like
replace( translate( '(714):555[1212]-',' ','[]()-.:_<>')
, ' ', '%')
-- or in a program
> select pn,replace( translate( pn,' ','[]()-.:_<>'), ' ', '' )
from qtemp/p where pn like
replace( translate( :P,' ','[]()-.:_<>'), ' ', '%')


....+....1....+....2....+....3....+...
PN REPLACE
(714) 555-1212 7145551212
(714)555-1212 7145551212
(714) 5551212 7145551212
714 555 1212 7145551212
714-555-1212 7145551212
7145551212 7145551212
[714](555)1212 7145551212
a714\\p555n1212x34 a714\\p555n1212x34
714: 555.1212 x:44 7145551212x44
******** End of data ********

</code>


Or because the expression defining the value for the predicate should be evaluated just once for the query, a UDF [or even several] to effect some of the /better/ searches described in [or as can be inferred from] prior examples is possible. For example a function which strips out the non-digit characters can be used to recombine each of the digit groups with the zero-to-many percent sign search character.

<code>

-- Instead of:
select ...
WHERE PhoneNbr LIKE 'AAA%PPP%NNNN%'
OR PhoneNbr LIKE '(AAA%PPP%NNNN%'

-- Perhaps:
select ...
WHERE PhoneNbr LIKE BldLikeAPN(:P) /* Gen: 'AAA%PPP%NNNN%' */
OR PhoneNbr LIKE '(' concat BldLikeAPN(:P)

</code>

Or like the last example in the quoted message... Outside of SQL or using SQL, set the variables AAA, PPP, and NNNN, to the digit-groups [strings] of the digits AAAPPPNNNN and then perhaps the following:

<code>

create table qtemp/pn (pn char(18))

Exec SQL
insert into qtemp/pn
select pn
,replace( translate( pn,' ', '[]()-.:_<>'), ' ', '' ) px
from p
where
pn = :P /* Exact Match to the value typed by user */
-- :P 20A inz('(714)-555-1212')
OR pn like replace( translate( :P,' ', '[]()-.:_<>'), ' ', '')
concat '%'
OR pn like :aaa concat '%' concat :ppp
concat '%' concat :nnnn concat '%'
OR pn like '(' concat :aaa concat '%' concat :ppp
concat '%' concat :nnnn concat '%'
;

</code>

Or perhaps [aiming for "prettier" SQL with more and more complex tests] move all of the common possible formatting options, for both EQuivalence and LiKe, into UDFs; being sure to avoid a replacement character [percent or underscore] as the first byte in hopes of enabling the implementation by an index:

where
pn = :EM /* Exact Match value typed by user */
OR pn = PhoneStyleUSAEQ(:EM) /* (123) 456-7890 */
OR pn = PhoneStyleUS1EQ(:EM) /* (123)456-7890 */
OR pn = PhoneStyleUS2EQ(:EM) /* 123-456-7890 */
OR pn like PhStyleDIGLK(:EM) /* 1234567890% */
OR pn like PhStyleUS1LK(:EM) /* (123%456%7890% */
OR pn like PhStyleUS2LK(:EM) /* 123%456%7890% */


where
pn IN ( :EM /* Exact Match value typed by user */
, PhoneStyleUSAEQ(:EM) /* (123) 456-7890 */
, PhoneStyleUS1EQ(:EM) /* (123)456-7890 */
, PhoneStyleUS2EQ(:EM) /* 123-456-7890 */
)
OR pn like PhoneStyleDIGLK(:EM) /* 1234567890% */
OR pn like PhoneStyleUS1LK(:EM) /* (123%456%7890% */
OR pn like PhoneStyleUS2LK(:EM) /* 123%456%7890% */

</code>


As an Amazon Associate we earn from qualifying purchases.

This thread ...


Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.