|
See below, I had saved it but can't find original link
A Fuzzy Search Algorithm
Most inquiry programs require a user to enter the exact value or exact
beginning portion of a value to be located. Unfortunately such programs
do not find the desired data if the user doesn't remember the exact
spelling of the search term. I have an algorithm that allows searching
by any portion of a database field, even if the user misspells the
search argument.
I assign a matching rate based on the difference in the number of
positions. If a certain character is found in the same position of both
the search string and the database field, I assign a matching rate of
100. If the two are one position away from each other, the rate is 90.
Here is the complete scale:
Absolute difference between character positions Matching rate
0 100
1 90
2 80
3 70
4 60
5 50
6 40
7 30
8 20
9 10
10 or more 0
Consider, for example, a file that consists of two fields: a part number
and part description. This file contains three records, as per this
example:
Part Number Part Description
111 WIFE
222 STOP
333 KNIFE
These are the calculated matching rates.
WIFE 200
STOP 0
KNIFE 300
-- Victor Pisman
Victor's source code is shown below. It consists of three objects--an
input item master file, an output file to hold the results of the
search, and an RPG program to perform the search.
-- Ted
Output file IMOUT
A REF(IMIN)
A R FM$OUT
A XXITNO R REFFLD(IMITNO)
A XXDESC R REFFLD(IMDESC)
A XXRATE 3 0
A K XXRATE
<font face='times new roman',serif">Program IM
* Fuzzy search algorithm
* written by Victor Pisman
*
FIMIN IF E DISK
FIMOUT O E DISK A
*
E AAA 15 1
E BBB 15 3 0
*
I 'ABCDEFGHIJKLMNOPQRST-C UC
I 'UVWXYZ'
*
I 'abcdefghijklmnopqrst-C LC
I 'uvwxyz'
*
C *ENTRY PLIST
C PARM ##PARM 15
*
C LC:UC XLATE##PARM ##PARM
*
C READ IMIN 99
C *IN99 DOWEQ*OFF
C EXSR SEARCH
C Z-ADDTTRATE XXRATE
C MOVELIMITNO XXITNO
C MOVELIMDESC XXDESC
C WRITEFM$OUT
C READ IMIN 99
C ENDDO
*
C MOVE *ON *INLR
C***********
C SEARCH BEGSR
***********
* Get the length of the search argument
C MOVEL##PARM ENTRY 15 P
C ' ' CHEKRENTRY L 20
* FILL THE ARRAY AAA WITH THE INPUT FIELD VALUE.
C CLEARAAA
C Z-ADD1 I 50
C 1 DO L I
C 1 SUBSTENTRY:I AAA,I
C ENDDO
*
* FILL THE ARRAY BBB.
C CLEARBBB
C Z-ADD*ZERO BBBTOT
C *LIKE DEFN BBB BBBTOT+ 1
C Z-ADD1 I
*
* CHECK FOR EXACT MATCH FIRST.
C ENTRY:L SCAN IMDESC:1 50
C *IN50 IFEQ *OFF NO EXACT MATCH
*
C 1 DO L I
C LC:UC XLATEIMDESC XXNAME
C *LIKE DEFN IMDESC XXNAME
C AAA,I SCAN XXNAME RATE 30 50
C *IN50 IFEQ *ON
C I SUB RATE BBB,I
C ADD BBB,I BBBTOT
C ELSE
C Z-ADD99 BBB,I
C ENDIF
C ENDDO
*
C ENDIF EXACT MATCH
*
* CALCULATE THE NUMBER OF LEADING BLANKS X IN THE INPUT FIELD.
C L IFGT *ZERO
C BBBTOT DIV L X 30H
C ENDIF
*
* CALCULATE THE ABSOLUTE VALUE OF INDIVIDUAL DIFFERENCES.
C Z-ADD1 I
C 1 DO L I
C BBB,I IFNE 99
C SUB X BBB,I
C ENDIF
C BBB,I IFLT *ZERO
C MULT -1 BBB,I
C ENDIF
*
C BBB,I IFLT 10
C MULT -10 BBB,I
C ADD 100 BBB,I
C ELSE
C Z-ADD*ZERO BBB,I
C ENDIF
*
C ENDDO
* CALCULATE TOTAL MATCHING RATE
C XFOOTBBB TTRATE 50
*
C ENDSR
-----Original Message-----
From: midrange-l-bounces@xxxxxxxxxxxx
[mailto:midrange-l-bounces@xxxxxxxxxxxx] On Behalf Of jamesl@xxxxxxxxxxx
Sent: Friday, July 18, 2003 12:35 PM
To: midrange-l@xxxxxxxxxxxx
Subject: Fuzzy string matching?
Would anybody happen to know where I might find a "fuzzy" string match
algorithm that either is, or can be, implemented in ILE C or ILE RPG?
Something
that could recognize slightly misspelled
words?
--
J.Lampert
As an Amazon Associate we earn from qualifying purchases.
This mailing list archive is Copyright 1997-2025 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].
Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.