Re: CSV files with Byte Over Marks -- RPG400-L

On 6/19/2018 2:24 PM, Soucy, Michael wrote:

What's the CCSID of your job, your QCCSID system value, the DB2 tables involved?

I don't mean to sounds like an idiot here, but how do I check that? I'm just running Scott's utility from green screen in debug mode.

For your job:
DSPJOB option 2. Page down 3(?) times, it's the Coded character set
identifier.

For your system value:
DSPSYSVAL QCCSID

For your files:
DSPFD lib/file Page down 2(?) times, it's the CCSID

If there's a 65535 in the mix, that could be the issue, because 65535 means 'Do not translate this binary data'.

The point of this line of inquiry is to determine what conversions might
be necessary. Maybe we need to step back a bit more and talk a bit
about text conversions in general.

When the distant ancestor of IBM i (System/38 CPF) was released, IBM
released it using one character set, or encoding: EBCDIC. Years later,
IBM introduced other CCSIDs to support other languages like Turkish,
Danish, Thai, etc. When that happened, they created a new system value
named QCCSID, and for the sake of compatibility, defaulted it to 65535 -
no translate / no conversion. Where I am in the US, the 'expected'
value would be 37 - US English.

Let's keep the example simple, and imagine that we want to export a file
of EBCDIC text off to an ASCII PC. Generally, there is a one-to-one
mapping of 'characters' in EBCDIC and ASCII. In EBCDIC, the number 1 is
x'F1'. In ASCII, it's x'31'. So we might have an item number on IBM i
as '12345' - x'F1F2F3F4F5'. If we sent those bits over to an ASCII
machine and opened it in Notepad, it would be gibberish. What needs to
happen is a conversion of the bits from x'F1F2F3F4F5' to x'3132333435'.
This is the sort of thing that IBM i Access does when it does a File
Transfer, the same thing FTP does when in Text mode, the same thing any
program does when reading text in one character set and writing in another.

That's the same general idea needed when talking about any two,
different CCSIDs. The bit encoding for any particular individual
character might differ between the CCSIDs, and so any text needs those
bit patterns converted from the source encoding to the target encoding.

The most common conversion we face is probably between EBCDIC (37) and
ASCII (819) or Windows (1252). But we're starting to see conversions
between EBCDIC (37) and UTF-8 (1208). Which might be where you are, but
we haven't yet established what CCSIDs are in use at your site.

A conversion error might ensue if the program is trying to convert from
UTF-8 (1208) to Binary/No conversion (65535), which is why I started you
looking in that direction. That's not a program issue - there isn't any
conversion possible. Remember, we can only convert textual characters.
Files like JPG or TIFF need their binary bit patterns to remain exactly
as-is.

A conversion error might also ensue if the source encoding is lying
about what's inside the file; what if the ADP file really is UTF-8 but
the CCSID of that file is 1252 (Windows)? The system will use the wrong
conversion table, and if it comes across text data that is illegal in
1252, it'll roll over because it doesn't know what to do with it.
That's why I wanted you to look at the IFS file in hex - to see exactly
what bit patterns are physically stored in there.

Sorry for the length. If I were smarter I could make it shorter :-(