Re: Generate hash code for a source member? -- MI400

Hello, Dan:

I would just simply like to add the following thought:

1. although it is always possible that two slightly different source
members could generate the same "check-sum" or "hash code",
it is pretty unlikely;

2. if two members generate the same hash-code or check-sum,
they are probably the same (with, say, > 99% accuracy);

but,

3. if two members generate a different hash-code or check-sum,
they are definitely NOT THE SAME.

So, with that in mind, I think that it is fairly "safe" to use this
type of comparison of a hash-code or check-sum, along with
some other information that is easy to obtain, for example, using
DSPFD ... TYPE(*MBR or *MBRLIST) OUTPUT(*OUTFILE),
and compare that to the same outfile created on the "master"
machine, looking at obvious things like member changed date/time
stamps that are different, number of records are different, etc.

So, I am suggesting that a hash-code or check-sum can be used,
in combination with several other "sanity checks", such as that the
member has the same number of records, the same record length,
etc., before deciding that these two members are in all probability
truly the same.

As others (Gene Gaunt) have suggested, the ultimate tool to
compare members is the IBM CMPPFM command. Of course,
this requires you to save an entire source physical file on one system,
and send it over to the other system, where you will do the compare,
and restore it (in another library).

NOTE: the CMPPFM has a "nice" feature that many people are
apparently unaware of, and that is, it can be used to compare *ALL
members in one source file with all members in another source file,
and report only the "differences", where differences includes any
members found in one file but not in the other, as well as members
that exist in both files but that contain different data. The syntax for
this multi-member comparison is:

    CMPPFM NEWFILE(newlib/QRPGSRC) +
                    NEWMBR(*ALL) +
                OLDFILE(oldlib/*NEWFILE)  +
                OLDMBR(*NEWMBR) +
                CMPTYPE(*LINE)  +
                RPTTYPE(*SUMMARY) +
                OUTPUT(*PRINT)

Some of you may prefer to use RPTTYPE(*DIFF) ...

Also, some of you who have been around IBM (mainframe) systems
for a long time may recognize the format of the output reports of the
CMPPFM command; and, yes, it is essentially the old SUPER-C
(SuperCompare) utility, ported from the mainframe to the AS/400. ;-)

SuperC was made available by IBM for OS/400, from about V1R3
to V2R3, as a PRPQ, before CMPPFM was integrated into PDM.

For those not familiar with SuperC, it uses a hash-coding technique
to quickly isolate lines that are different in the two members. For more
information about such source comparison techniques, I refer you to
the (computer science) literature:

A Technique for Isolating Differences Between Files, by Paul Heckel,
(c) 1978, Communications of the ACM, Apr. 1978, Vol. 21, No.4

This is also essentially the same technique used by the diff command
provided with most Unix and Linux distributions.

Regards,

Mark S. Waterbury

----- Original Message -----
From: <thomas@inorbit.com>
To: <mi400@midrange.com>
Sent: Thursday, May 09, 2002 2:51 PM
Subject: RE: [MI400] Generate hash code for a source member?


> Dan:
>
> First thing to keep in mind is that no hash value is going to be
> foolproof unless your hashes have as many significant characters
> as your largest members. Hashes can be pretty good, but they
> won't guarantee uniqueness.
>
> With that in mind, note that _most_ switched characters will
> indeed generate different hashes; the XFOOT was suggested over
> groups of 4 characters for 32 bits. "ABCD" is definitely a
> different 32-bit value from "BACD", e.g., 3250766788 vs.
> 3267478468. Similarly, "ABCD" and "EFGH" are together different
> from "ABCE" and "DFGH". That is, while many transpositions won't
> be caught, most of them will be caught.
>
> Switched records are slightly more trouble, but something such as
> RRN being used a kind of seed value should help.
>
> The real question comes down to exactly how precise do you need
> this to be? Do you need to guarantee you'll catch every
> duplication or variation?
>
> Tom Liotta
>
> "Dan Bale" wrote
>
> > A simple XFOOT solution, I'm thinking, will not catch changes
> where
> > characters are switched, or where records are switched.  I
> realize I could
> > introduce some logic to multiply each element in a record by a
> different
> > value, and do something similar by RRN, and then have to deal
> with overflow,
>
> --
> Tom Liotta
> The PowerTech Group, Inc.
> 19426 68th Avenue South
> Kent, WA 98032
> Phone  253-872-7788
> Fax  253-872-7904
> http://www.400Security.com
> ___________________________________________________
> The ALL NEW CS2000 from CompuServe
>  Better!  Faster! More Powerful!
>  250 FREE hours! Sign-on Now!
>  http://www.compuserve.com/trycsrv/cs2000/webmail/
>
>
>
>
> _______________________________________________
> This is the MI Programming on the AS400 / iSeries (MI400) mailing list
> To post a message email: MI400@midrange.com
> To subscribe, unsubscribe, or change list options,
> visit: http://lists.midrange.com/cgi-bin/listinfo/mi400
> or email: MI400-request@midrange.com
> Before posting, please take a moment to review the archives
> at http://archive.midrange.com/mi400.
>