Re: Base64 decode -- RPG400-L

Hello Greg,

Comments in-line:

On 8/11/2020 11:22 AM, Greg Wilburn wrote:

Thanks for the reply. For whatever reason I struggle with the whole encoding thing.
I guess I (incorrectly) assumed the base64 encoded element was UTF-8 because the XML document was UTF-8?

The purpose of base64 is to ensure the integrity of the underlying binary value. To put it another way: Bytes go in (encoding), and the exact same bytes come out (decoding).

For example, if I take the ISO-8859-1 string "Scott", it has a hex value of 53 63 6f 74 74

If I base64 encode it, it will result in U2NvdHQ=

It doesn't matter whether the U2NvdHQ= is encoded as UTF-8, or ASCII, or EBCDIC, it doesn't matter if its single byte or double byte or any of 1000 other encodings. When it is decoded, it will go right back to the same hex string of 53 63 6f 74 74

That's the purpose of base64 -- to allow data to retain the exact same binary value, even if it is transferred over text medium. The most common purpose in the early days was to transfer photos in e-mail. In an image file like a photo, the byte values don't represent letters or text, the represent stuff like colors and pixels to be drawn on the screen. If you translated those byte values from (for example) ASCII to EBCDIC, the picture would become completely corrupt. Since e-mail is a text medium, it was not safe to send pictures through e-mail until base64 encoding made it possible. (Actually, there was an earlier system called uuencoding that was used prior to base64, but is mostly replaced by base64... but, you get the idea.)

So, yes... your XML document was UTF-8 (until you translated to EBCDIC, anyway!) But that means the XML tags like <xxxxxx> were in UTF-8. The encoded data (like my "U2NvdHQ=" example) was encoded with UTF-8 characters, but once decoded, it'll have the exact same byte values that were input by whomever encoded it.

I took your (original) advice and removed the CCSID from my RPG variable that receives the base64_decode.
Then I changed my open() API to use CCSID = 0

I agree with removing the CCSID from the RPG variable. I don't agree with using 0 with open().

fd = open(Stmf:flag:mode:0:0);
rc = write(fd: %addr(base64decoded): %len(%trim(base64decoded)));

So CCSID = 0 means "My job's flavor of EBCDIC". Which is not right unless whomever encoded the data was using the same flavor of EBCDIC that you are. That's almost certainly not the case. Also, how these CCSIDs are used will depend on what you have in the 'flag' parameter, which you haven't shown us.

What you need to do is find out what the actual encoding of the ZPL label was before it was base64 encoded. Then, you need to call open() and tell it the CCSID that corresponds to that encoding. Do NOT use O_TEXTDATA on the open() call because the data is already in that encoding, you don't want it to translate it.

The CCSID of the IFS file is 37. I can view it using Notepad++ and WRKLNK (although this does indicate: Message . . . . : File CCSID not valid.
Cause . . . . . : The file Coded Character Set Identifier (CCSID) was 00037,
but the data in the file looks like ASCII. A CCSID of 00819 is being used.
Recovery . . . : If another CCSID is needed, use F15 to change to the
desired CCSID. )

So what's happening here is that you've obviously given it the wrong CCSID, so it's guessing at CCSID 819 instead. CCSID 819 is ISO-8859-1.

But, this is a guess... just like UTF-8 was a guess... Rather than take a guess at how its encoding, ask whomever encoded it!

I'm not sure how would I "know" what encoding the string is once I have decoded it using base64_decode()?
Would this be indicated in the WSDL for the SOAP web service?
The XML response indicates <?xml version="1.0" encoding="utf-8"?>

Asking whomever is encoding the document is the only way I know to find out the proper encoding.

The most commonplace ones would be ISO-8859-1, Windows-1252 or UTF-8. Though, of course, we've already ruled out UTF-8. There's no point in guessing... ask.