Re: CCSID - RPG issues -- RPG400-L

Thanks for the PDF. Small typo in the PDF: UTF8 can have also 4 bytes per code point.
Personally, when dealing with unicode, I've standardized as suchDB: UTF16 with normalize (for MOST business purposes in multilang setting you at 99% get a single code point each 2 bytes, the occasional 4 bytes can happen).internals/logics: UTF16the occasional unicode display 5250 with proper fonts: UTF16API and web: usual UTF8String querying and modifications: service programs in QICU (unfortunately IBM still have ancient ICU there, usable, but more than 10 years old. See RFEs. I think they are limited by compilers).
my 2cciao

On Wednesday, February 25, 2026 at 02:03:37 PM GMT+1, Barbara Morris <bmorris@xxxxxxxxxx> wrote:

On 2026-02-25 5:41 a.m., Gad Miron wrote:

Hello Barbara

The (source) ds_NMMLIMP2.ITEMNAME looks OK - I'll check again though

Does %sbst function "knows" how to deal with a VARLEN data?

I'll check the CHARCOUNT thing.

%subst understands VARLEN data.

ds_NMWRKR.ITEMNAME8 = %subst(ds_NMMLIMP2.ITEMNAME :1 : 128) ;

But I don't think you should use %SUBST here.

The target field ds_NMWRKR.ITEMNAME8 has a length of 128 so if the
source field ds_NMMLIMP2.ITEMNAME has data that is longer than 128, it
would be automatically truncated.

If the source field ds_NMMLIMP2.ITEMNAME has data that is shorter than
128, the %SUBST would fail with RNX0100 ("Length or start position is
out of range for the string operation"). The start and length for %SUBST
refer to the current VARLEN length so 128 would not be allowed unless
the length of the source is 128 or greater.

Use an ordinary assignment instead:
ds_NMWRKR.ITEMNAME8 = ds_NMMLIMP2.ITEMNAME;

But this assignment without %SUBST would still be a problem if the data
in ds_NMMLIMP2.ITEMNAME is longer than 128 bytes and bytes 128 and 129
contain a 2-byte Hebrew character. ds_NMWRKR.ITEMNAME8 would only get
the first byte of that character (in CHARCOUNT STDCHARSIZE mode).

I did a little experiment with some CCSID 424 (Hebrew) data and all the
Hebrew characters were 2 bytes in UTF-8. It seems likely that would be
true of all Hebrew characters. If your source string has a mixture of
Hebrew (2-bytes per character) and non-Hebrew (1-byte per character), I
think you'd have a 50/50 chance of having invalid data in the target
string after the assignment in STDCHARSIZE mode.

Adding CHARCOUNTTYPES(*UTF8) and then doing the assignment in CHARCOUNT
NATURAL mode would allow RPG to handle the UTF-8 data correctly and
possibly only assign the first 127 bytes instead of splitting the last
character.

This mailing list archive is Copyright 1997-2026 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.