Scott;
Thanks for all you do!
One question though, shouldn't you have used a sausage analogy instead of a jam analogy, you're a Klement not a Smucker! :-)
Duane Christen
--
Duane Christen
Senior Software Engineer
(319) 790-7162
Duane.Christen@xxxxxxxxxx
Visit PAETEC.COM
-----Original Message-----
From: rpg400-l-bounces@xxxxxxxxxxxx [mailto:rpg400-l-bounces@xxxxxxxxxxxx] On Behalf Of Scott Klement
Sent: Wednesday, August 25, 2010 4:38 PM
To: RPG programming on the IBM i / System i
Subject: Re: Creating a unicode file in the IFS
Hi Dennis,
Whenever you're discussing CCSIDs, it's always important to distinguish between the "label" and the "fact". By "label", I mean the number that is assigned to an object in it's object description. By "fact", I mean that actual table of ASCII, EBCDIC, etc that the computer used to
generate the original data when it was created. In a perfect world,
these would both be the same CCSID, but frequently (especially in data
interchange) they are not.
An analogy: Suppose you have made large batches of both strawberry and raspberry jam. (Mmmmm.. homemade jam). You put your jam into jars, and stick a label on each jar that says either "strawberry" or "raspberry".
Otherwise, you might not know (without tasting, anyway) which jam is in which bottle.
In that scenario, you have a "label" (the sticker on the outside) and the "fact" (the actual fruit in the jar.) You hope they are the same -- but there's always a chance for a mistake. A mislabeled jar would result in the great tragedy of eating strawberry instead of raspberry.
(Okay, I guess there have been bigger tragedy's in history -- but you get the idea.)
The same is true with your data exchange. It'd be nice if CCSID in the obj description matches the actual encoding used for the data in the file. But it often gets mixed up -- so I suggest not paying too much attention to that file descriptor. Don't place too much trust in the "label". Instead, evaluate the data itself.
Another point: Some files aren't text, and therefore the CCSID on the "label" is irrelevant. There is no "fact" for you to place a label on, because the data inside isn't related to ASCII or EBCDIC, it's just raw binary data. (Having a hard time fitting this into the jam analogy -- maybe a jar full of water. It doesn't matter if it says "strawberry" or "raspberry", because it's not jam... not a great analogy, I guess.)
Then you have to deal with computer programs simply NOT KNOWING what the correct encoding is. A person might be able to guess at an encoding by looking at the contents of an object -- but there's no way a computer can do that. They lack the intelligence. They make assumptions
(defaults) about what things are -- but they expect YOU to solve the problem.
Okay, okay... enough philosophy, let's look at your scenario.
STEP 1: You use 'touch' to create a file with CCSID 37.
Since 'touch' creates an empty file, all you've done is set the label.
(i.e. put a "strawberry" on an empty jam jar.) The system is relying on you to use appropriate methods to actually put EBCDIC data into the file. (Or to use software that ignores the CCSID entirely if you're working with binary data.) In other words: If you label the jar "strawberry", it's still up to you to make sure you put strawberry jam, not raspberry in the jar.
But I think you've already tested and discovered that you, indeed, had the right data in the file... so on to the next step.
STEP 2: Use 'tar' to create a tarball.
Okay, now you need to know what the 'tar' utility is going to do with regard to CCSID translation. Bear in mind that 'tar' was invented for Unix, and Unix file systems don't typically store a "label" for the CCSID of an object. So the TAR program isn't going to keep track of your CCSID while it's inside the tarball.
I don't know whether you're using QShell or PASE to create the tarball.
I'm pretty sure we can rely on PASE to make a binary-perfect copy of the file. I'm not so confident about QShell, since QShell is native to IBM i, and IBM i users often create stuff in EBCDIC. Sending an EBCDIC file in a Unix Tape Archive (tar) makes little sense -- so QShell may be trying to translate it to ASCII. Obviously, if your data is binary data (such as a Java JAR file) this would corrupt it.
Let's look at the docs for the QShell 'tar' program:
http://publib.boulder.ibm.com/infocenter/iseries/v5r4/topic/rzahz/tar.htm
At the bottom of the page, it says (quote):
QIBM_CCSID
The value of the environment variable is the CCSID used
to create files extracted from the archive. There must
be a valid translation from CCSID 819 to the specified
CCSID.
This description is only talking about extraction -- not creation. It doesn't say anything about creating the data in the tarball with regard to CCSID. Hmmm.. let's do a test:
echo "hello" > mytest.txt
$
od -x mytest.txt
0000000 8885 9393 9625
0000006
tar cf hello.tar mytest.txt
$
The tarball itself is a binary document. The CCSID on that document is irrelevant, since it does not contain pure text. The CCSID of 'mytest.txt' is 37, I verified that with WRKLNK. The hex dump of the file (as shown above) shows that the data is indeed EBCDIC... hex 88 is the EBCDIC code for the "h" in the first position.
Now I transfer that tarball to my FreeBSD PC, and I'm careful to use binary mode. (Though, if I wasn't, it'd corrupt the tarball, and I wouldn't be able to extract anything -- so using ASCII mode wouldn't translate the contents of the file INSIDE the tarball.)
On my FreeBSD box:
$ tar xf hello.tar
$ ls -l mytest.txt
-rw-r--r-- 1 klemscot wheel 6 Aug 25 16:06 mytest.txt
$ od -x mytest.txt
0000000 6568 6c6c 0a6f
0000006
$ cat mytest.txt
hello
As you can see... I had no trouble extracting the data, but the data is now in ASCII, not EBCDIC! It's now viewable in FreeBSD, an ASCII system, and the hex dump confirms that it is indeed ASCII.
The file no longer has a CCSID "label", since FreeBSD has no notion of CCSIDs. (Nor does the FTP protocol, itself, by the way. It has no way of communicating a CCSID from the server to my PC.) The CCSID label is gone and irrelevant now -- but the data is OBVIOUSLY in ASCII.
Let's try it again, back in QShell, this time setting QIBM_CCSID when building hello.tar. The Info Center says something about translating from QIBM_CCSID to 819. When creating, does it do the opposite? If so, setting QIBM_CCSID to 819 should stop it from translating. In QShell:
$
rm hello.tar
$
QIBM_CCSID=819 tar cf hello.tar mytest.txt
$
Once again, FTP it to the FreeBSD box in binary mode, and do this:
$ tar xf hello.tar
$ od -x mytest.txt
0000000 6568 6c6c 0a6f
0000006
It's *still* ASCII! QShell's tar utility seems to have decided to translate my EBCDIC file to ASCII, and it doesn't seem to care about QIBM_CCSID. Lovely. Let's try PASE... from QP2TERM:
$
rm hello.tar; tar cf hello.tar mytest.txt
$
Once again, FTP the file to FreeBSD in binary mode, then do:
$ tar xf hello.tar; od -x mytest.txt
0000000 8588 9393 2596
0000006
W00t! The data is still EBCDIC as I intended. So there you go, QShell's tar utility is translating the data to ASCII when it creates the tarball.
STEP 3: Transfer the file to intermediate places using FTP
Since you used binary mode, I'm not worried about this part of the transfer. If you screw up the binary contents of a tarball, it won't extract. So this part is a non-issue.
STEP 4: Receive the tarball on the destination system via FTP
You get it, and you note that the CCSID of the tarball is now 37. Since a tarball isn't text, it shouldn't make any difference what it's CCSID
is... Again, the CCSID you see is just a label on the outside of a
container. It doesn't matter if you stick a "strawberry jam" label on a rock, since nobody is going to spread the rock on toast and try to eat it.
Likewise, the CCSID on the tarball doesn't matter because nobody is going to try to read it as a text file. I don't know why this one is getting 37... Remember, the PC it's coming from has no notion of CCSIDs, and the FTP protocol has no way of communicating CCSIDs. So presumably, 37 is just the default that's set for new files in your FTP server's configuration. But again, it's irrelevant.
STEP 5: Extract the data using QShell's tar command.
We already know from the Info Center, as well as experiments, that QShell's tar program expects the contents of a tar file to be 819. Why are the new files being created with CCSID 37? Because your QIBM_CCSID is set to 37. But again, that's just the label -- it's not the fact.
But the IBM docs say that whatever CCSID is in QIBM_CCSID is what tar will TRANSLATE the data to... and it's going to translate FROM 819.
Ouch. That means that a Java JAR file will certainly be corrupt -- because JAR is not text.
Granted, when the tarball was created, it translated from EBCDIC->ASCII, and with your QIBM_CCSID set to 37, it should therefore try to translate back from ASCII->EBCDIC, and therefore give you the same result you started with. However, there's a flaw in that logic... there may be characters in CCSID 37 that don't exist in 819. Likewise, there may be characters in CCSID 819 that don't exist in 37. Therefore, you can't assume that an EBCDIC->ASCII translation, followed by an ASCII->EBCDIC translation will result in the same file at the end.
Worse, if you're extracting a tar file created on a non-IBM i system (where the data wasn't originally run through a translation table when the tarball was created) the results will be significantly different from the input file.
At this point, I'd test to see if I understood the IBM docs correctly -- but frankly, this message is taking too long to write, and it's a pretty much moot point, anyway.
The solution is not to use the TAR commadn in QShell, because it doesn't do what you expect. It treats the data as text and translates it to ASCII -- which is not what you want. Use the PASE tar utility, or get a 3rd party Unix tar (such as 7zip) working in PASE... Or simply use the JAR utility to archive things. in any case, it should solve your problem.
On 8/25/2010 3:11 PM, Dennis Lovelady wrote:
Slight variation of this topic. I primarily work with systems in two
countries: Germany and USA. On each of these systems, my profile's
CCSID is set to 37. DSPJOB option 2 shows 37 for CCSID, 37 for Default CCSID.
When I go to into QSH on either system and type a command like "touch
new_file" then a new file will be created with CCSID 37. So far so good.
But now I want to interchange files from one system with the other.
The files I want to interchange (from the German system) are all CCSID
819. I use a command like "tar cvf mytar.tar path_to_files" to package these files.
Interestingly the tarball is created with CCSID 819. (I don't know
why this would be, but I like it - I think.)
Now, I ftp the file to my PC. Because of security restrictions, in
order to get the file to the USA system, I have to make a pit stop at a local server:
FTP to intermediate server from PC. Log in to USA system and FTP from
intermediate to me. All FTP done in strict BINARY mode.
Now the tarball is CCSID 37 on the USA system. But I'm successful
(apparently) in untarring with tar xvf.
However, the untarred files are all CCSID 37. More importantly, they
don't operate as expected. For example, I usually do this with .JAR
files, and I invariably end up with complaints from JAVA about the
central directory being corrupted or some such. I've also tried
certain binaries (such as the AIX ZIP/UNZIP suite), and those don't
run either. The QSH command "setccsid" doesn't seem to help (although
it does set the CCSID of the file), and I'm quite confused as to why this would be.
Any pointers?
Dennis Lovelady
http://www.linkedin.com/in/dennislovelady
--
"A lawyer is a learned gentleman who rescues your estate from your
enemies... and keeps it for himself."
-- Henry Broughman
--
This is the RPG programming on the IBM i / System i (RPG400-L) mailing list To post a message email: RPG400-L@xxxxxxxxxxxx To subscribe, unsubscribe, or change list options,
visit:
http://lists.midrange.com/mailman/listinfo/rpg400-l
or email: RPG400-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives at
http://archive.midrange.com/rpg400-l.
As an Amazon Associate we earn from qualifying purchases.