× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



See below, I had saved it but can't find original link


A Fuzzy Search Algorithm 



Most inquiry programs require a user to enter the exact value or exact
beginning portion of a value to be located. Unfortunately such programs
do not find the desired data if the user doesn't remember the exact
spelling of the search term. I have an algorithm that allows searching
by any portion of a database field, even if the user misspells the
search argument. 


 
 
I assign a matching rate based on the difference in the number of
positions. If a certain character is found in the same position of both
the search string and the database field, I assign a matching rate of
100. If the two are one position away from each other, the rate is 90.
Here is the complete scale: 


Absolute difference between character positions  Matching rate  
0  100  
1  90  
2  80  
3  70  
4  60  
5  50  
6  40  
7  30  
8  20  
9  10  
10 or more  0  


Consider, for example, a file that consists of two fields: a part number
and part description. This file contains three records, as per this
example: 

Part Number  Part Description  
111  WIFE  
222  STOP  
333  KNIFE  


These are the calculated matching rates. 

WIFE  200  
STOP  0  
KNIFE  300  


-- Victor Pisman 

Victor's source code is shown below. It consists of three objects--an
input item master file, an output file to hold the results of the
search, and an RPG program to perform the search. 

-- Ted 




Output file IMOUT 

     A                                      REF(IMIN)
     A          R FM$OUT
     A            XXITNO    R               REFFLD(IMITNO)
     A            XXDESC    R               REFFLD(IMDESC)
     A            XXRATE         3  0
     A          K XXRATE 

<font face='times new roman',serif">Program IM 

  * Fuzzy search algorithm
  * written by Victor Pisman
  *
 FIMIN    IF  E                    DISK
 FIMOUT   O   E                    DISK                      A
  *
 E                    AAA        15  1
 E                    BBB        15  3 0
  *
 I              'ABCDEFGHIJKLMNOPQRST-C         UC
 I              'UVWXYZ'
  *
 I              'abcdefghijklmnopqrst-C         LC
 I              'uvwxyz'
  *
 C           *ENTRY    PLIST
 C                     PARM           ##PARM 15
  *
 C           LC:UC     XLATE##PARM    ##PARM
  *
 C                     READ IMIN                     99
 C           *IN99     DOWEQ*OFF
 C                     EXSR SEARCH
 C                     Z-ADDTTRATE    XXRATE
 C                     MOVELIMITNO    XXITNO
 C                     MOVELIMDESC    XXDESC
 C                     WRITEFM$OUT
 C                     READ IMIN                     99
 C                     ENDDO
  *
 C                     MOVE *ON       *INLR
 C***********
 C           SEARCH    BEGSR
  ***********
  * Get the length of the search argument
 C                     MOVEL##PARM    ENTRY  15 P
 C           '  '      CHEKRENTRY     L       20
  * FILL THE ARRAY AAA WITH THE INPUT FIELD VALUE.
 C                     CLEARAAA
 C                     Z-ADD1         I       50
 C           1         DO   L         I
 C           1         SUBSTENTRY:I   AAA,I
 C                     ENDDO
  *
  * FILL THE ARRAY BBB.
 C                     CLEARBBB
 C                     Z-ADD*ZERO     BBBTOT
 C           *LIKE     DEFN BBB       BBBTOT+ 1
 C                     Z-ADD1         I
  *
  * CHECK FOR EXACT MATCH FIRST.
 C           ENTRY:L   SCAN IMDESC:1                 50
 C           *IN50     IFEQ *OFF                       NO EXACT MATCH
  *
 C           1         DO   L         I
 C           LC:UC     XLATEIMDESC    XXNAME
 C           *LIKE     DEFN IMDESC    XXNAME
 C           AAA,I     SCAN XXNAME    RATE    30     50
 C           *IN50     IFEQ *ON
 C           I         SUB  RATE      BBB,I
 C                     ADD  BBB,I     BBBTOT
 C                     ELSE
 C                     Z-ADD99        BBB,I
 C                     ENDIF
 C                     ENDDO
  *
 C                     ENDIF                           EXACT MATCH
  *
  * CALCULATE THE NUMBER OF LEADING BLANKS X IN THE INPUT FIELD.
 C           L         IFGT *ZERO
 C           BBBTOT    DIV  L         X       30H
 C                     ENDIF
  *
  * CALCULATE THE ABSOLUTE VALUE OF INDIVIDUAL DIFFERENCES.
 C                     Z-ADD1         I
 C           1         DO   L         I
 C           BBB,I     IFNE 99
 C                     SUB  X         BBB,I
 C                     ENDIF
 C           BBB,I     IFLT *ZERO
 C                     MULT -1        BBB,I
 C                     ENDIF
  *
 C           BBB,I     IFLT 10
 C                     MULT -10       BBB,I
 C                     ADD  100       BBB,I
 C                     ELSE
 C                     Z-ADD*ZERO     BBB,I
 C                     ENDIF
  *
 C                     ENDDO
  * CALCULATE TOTAL MATCHING RATE
 C                     XFOOTBBB       TTRATE  50
  *
 C                     ENDSR

-----Original Message-----
From: midrange-l-bounces@xxxxxxxxxxxx
[mailto:midrange-l-bounces@xxxxxxxxxxxx] On Behalf Of jamesl@xxxxxxxxxxx
Sent: Friday, July 18, 2003 12:35 PM
To: midrange-l@xxxxxxxxxxxx
Subject: Fuzzy string matching?


Would anybody happen to know where I might find a "fuzzy" string match

algorithm that either is, or can be, implemented in ILE C or ILE RPG?
Something 
that could recognize slightly misspelled
words?

--
J.Lampert




As an Amazon Associate we earn from qualifying purchases.

This thread ...

Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.