Re: MD5 hash with ASCII conversion -- RPG400-L

As usual Scott, you are generously thorough. You're right, yikes.
Comments & responses inline:

On Thu, Jul 9, 2009 at 7:14 PM, Scott Klement <rpg400-l@xxxxxxxxxxxxxxxx>wrote:

Hi Dan,

Yikes, there are so many things that can go wrong with the approach
you're taking. It's certainly possible to get it work, but I'm
wondering if you wouldn't be better served by taking a different approach?

Here's the approach I would recommend (though, I don't know that much
about the scenario, so I may be off-base)

1) Use CPYTOIMPF or some other tool to convert your PF to an ASCII file
in the IFS. Get it exactly the way the Windows system wants it.

2) Run your MD5 against the IFS file.

3) FTP the file to the Windows box in BINARY mode so nothing can be
changed in the data during transit.

So that's my recommendation. That will ensure that your MD5 is
calculated on the same data that the Windows one will be. That's
crucial, because the main purpose of an MD5 hash is to verify that the
data is identical. Any change to the data will give you a different
hash... which is why your current approach is so difficult.

Your current approach is tricky because you have to try to calculate
exactly what will be done to the file. In other words, you have to try
to guess the future :) Then, change each record in your file to look
like it will be in the future, and calculate the hash on that.
Possible, but tricky.

I *think* the CPYTOIMPF is a non-starter for several reasons:
1) The current process of transferring the files is set in production
concrete. It will be easier to work around this, with the approach I am
attempting. (A little background: Currently, the on-call person has to
"validate" these transfers manually, by opening up these files on the
Windows server in a text editor. The larger ones take about 20 minutes to
open. Once the file is opened, we "look" at the beginning, middle, and end
for problems. We've had FTP transfers that return a successful completion
status, but where we've had corruption in the file on the Windows box.
Needless to say, this is a time-consuming process that does not guarantee
we'll find problems that are lurking somewhere in the file.)
2) These are *huge* files we're moving, about a half a GB, and about 20 of
those every day. To effectively double the number of transfers, I think,
would impact our promised timelines.
3) I believe we would still need to run another MD5 hash run once the file
hits the final destination. Is there something about a binary transfer that
is more, um, reliable in terms of ensuring a complete transfer? To my
limited knowledge, there is no guarantee that the binary FTP transfer
between the IFS and the Windows server will give a prefect match (emphasis
on "guarantee").

If time were not the issue, I would just FTP the file back from the Windows
server to the i (saving it with a different file name) and then compare the
two files on the i by running the MD5 hash on those.

I think I understand your emphasis is on the uncertainty of *getting* the
translation *right*. But once (if?) I get it right, I presume it would work
consistently as designed.

If you persist with this approach, here are things to consider:

1) Translating EBCDIC to ASCII. You are already doing this, but I
wonder if you've thought of everything? For example, if your job is
CCSID 65535, RPG won't translate the file as it's read. But, if your
job ever changes, it will translate the data as it's read. So you can't
simply assume the data in your program will be the CCSID of the file,
and you can't assume it'll be the CCSID of the job. It could be either.
Plus, RPG uses the "mixed byte CCSID that corresponds to the jobs
CCSID" so you can't really use the job's CCSID directly anyway, even if
you know it'll never be used on a 65535 system. It's a bit tricky.
Once you get the right EBCDIC CCSID, then you also have to make sure
you're using the right ASCII one.

I can guarantee you that I haven't thought of everything! When I said I
cobbled this together, I wasn't kidding! Wow, Scott, this translation
business is complicating things, isn't it? I want control! Tell me I can
control all of this! Can I use CHGJOB CCSID to control that? Does FTP
indicate the CCSID of the target file?

Indeed, normally you would NOT want to translate data before calculating

an MD5 -- because therein lies the road to madness. But in your case
you have to because you're trying to predict what the file will end up
as...

I glanced at your code, and you do not appear to be using iconv().
wouldn't iconv() work better than QTQCVRT?

I have no idea. Again, "cobble" is key here. ;-) Are you suggesting that
it would or should?

2) Trimming trailing blanks... sounds like you're already doing this.

3) Adding CRLF. FTP in ASCII mode will add CRLF to the end of every
record, however, the FTP client MIGHT change that to just LF (probably
not if the client is Windows -- but in Unix it would... and some clients
allow that option anywhere.)

There is no "end of file" character normally. Some very old PC software
uses Ctrl-Z as end of file, but I don't think FTP does this, so it
wouldn't be an issue for you.

That's my $0.02 for now.

And it's much appreciated!

I think I'm going to try appending the CRLF characters to the %trimmed input
record and see if I get any satisfaction doing that.

I'll post results when they become available.

Thanks!
- Dan