Charles,
I have been playing (in my spare time...) with a metaphone based x-ref service. It will parse its input to generate a metaphone for each "word" that it finds, then adds/updates its database for that "field". On the other end, the database can be searched to find associations with other records. One could use a ranking function to find records with similar content.
It's little more than a basic proof-of-concept right now, but it seems to function well enough to be useful...
Eric
-----Original Message-----
From: rpg400-l-bounces@xxxxxxxxxxxx
[mailto:rpg400-l-bounces@xxxxxxxxxxxx]On Behalf Of Wilt, Charles
Sent: Wednesday, September 05, 2007 8:07 AM
To: RPG programming on the AS400 / iSeries
Subject: RE: Finding duplicate customer names
Mike & Carel,
Thanks for the reply.
SOUNDEX was already at the top of my plans.
However, I was thinking I remembered some other techniques.
What I came up with Friday was the following:
insert into ctsc_w/dupcust (
SELECT c.CUSNBR, c.CUSNAM, D.ADDR1, soundex(cusnam) as
name, soundex(d.addr1) as deladr1, d.addr2, soundex(d.addr2)
as deladr2, substr(d.zip,1,5)
FROM customp C left
outer join addrinp D on deladr = d.akey
WHERE c.locnbr = '00B61' and cussts <> 'D'
)
SELECT CUSNBR, CUSNAM, ADDR1, ADDR2, ZIP, NAME, DELADR1, DELADR2
FROM ctsc_w/dupcust A
WHERE exists (select 1 from ctsc_w/dupcust B
where a.cusnbr <> b.cusnbr
and a.name = B.name and a.zip = b.zip
and (( a.deladr1 <> 'Z000'
and a.deladr1 in (B.deladr1,b.deladr2))
or( a.deladr2 <> 'Z000'
and a.deladr2 in (B.deladr1,b.deladr2)))
)
ORDER BY name, deladr1
Thanks again,
Charles
-----Original Message-----
From: rpg400-l-bounces@xxxxxxxxxxxx
[mailto:rpg400-l-bounces@xxxxxxxxxxxx] On Behalf Of Carel Teijgeler
Sent: Friday, August 31, 2007 6:22 PM
To: rpg400-l@xxxxxxxxxxxx
Subject: Re: Finding duplicate customer names
Charles,
SQL Function SOUNDEX, perhaps?
Regards,
Carel Teijgeler
*********** REPLY SEPARATOR ***********
On 31-8-2007 at 16:29 Wilt, Charles wrote:
I seem to recall some discussions in the past about various
algorithms
that would help identify
duplicate customers names or some such field where you can't
depend on
an
exact match.
Anybody remember this, or care to share what's worked for you?
--
This is the RPG programming on the AS400 / iSeries (RPG400-L)
mailing list To post a message email: RPG400-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/rpg400-l
or email: RPG400-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/rpg400-l.
This e-mail transmission contains information that is intended to be confidential and privileged. If you receive this e-mail and you are not a named addressee you are hereby notified that you are not authorized to read, print, retain, copy or disseminate this communication without the consent of the sender and that doing so is prohibited and may be unlawful. Please reply to the message immediately by informing the sender that the message was misdirected. After replying, please delete and otherwise erase it and any attachments from your computer system. Your assistance in correcting this error is appreciated.
As an Amazon Associate we earn from qualifying purchases.