Re: Restructuring program to cope with modern database layouts -- RPG400-L

On 2025-05-15 2:51 p.m., cesco via RPG400-L wrote:

Because of ongoing issues with national characters, a decision was agreed upon to migrate all the data to SQL created tables, using > "modern" features like varchar fields, and CSSID 1208, UTF-8. This should enable transparent passthrough of textual data for ODBC

Personally I've - when full unicode storage is required - standardized on UTF16 (CCSID 1200) with *NORMALIZE on to reach some sane form for the database layer.
And on CCSID 1208 when required by external serialization (text file, REST JSON api...).
Both are variable encodings, i.e. 2 or 4 bytes per codepoint for CCSID 1200, with 2 bytes able to represent usually he full BMP for any conceivable business purposes.
CCSID 1200 is displayable even on a 5250 in unicode mode (i.e. you can display cyrillic and arabic on the same screen), albeit not frequently seen on the field (unfortunately).

RPG will convert between CCSID during variable assignment, if cannot do it, usually will place x'3F' in EBCDIC target (substitution character).
Anyway many important downstream systems you have to talk in normal business settings (like couriers) usually are very restricted anyway, so it is better to transform any string using proper lib like ICU (QICU library on the /400), a transform call will remove umlauts, accents and so on...

I would standardize on UTF16 for database too and only use UTF8 when required.

UTF8 can be complicated due to the way the characters can have 1 to 4 bytes which can complicate string processing.

RPG does support UTF8 quite well now, but it requires extra care when coding. (See the discussion of CHARCOUNT in the ILE RPG manual.)

UTF16 CAN also have the same issue with different length characters, but it's very rare to have 4-byte characters. I'm not sure, but I think it's only things like emojis that have 4 byte characters.

To avoid "silent errors" when RPG has to place the x'3F' replacement character because a a character couldn't convert, code CCSIDCVT(*EXCP) in the H spec so you get an exception (RNX0452) when that happens.

When changing older programs to use Unicode in database, there's always the potential of still having some job-CCSID work fields where those conversion errors can creep in. You can also code the *LIST option for the CCSIDCVT keyword to get a section in the compile listing that lists every CCSID conversion that's done in the module, along with a warning message if the conversion might not be able to convert the data.

This mailing list archive is Copyright 1997-2025 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.