Thanks for the PDF. Small typo in the PDF: UTF8 can have also 4 bytes per code point.
Personally, when dealing with unicode, I've standardized as suchDB: UTF16 with normalize (for MOST business purposes in multilang setting you at 99% get a single code point each 2 bytes, the occasional 4 bytes can happen).internals/logics: UTF16the occasional unicode display 5250 with proper fonts: UTF16API and web: usual UTF8String querying and modifications: service programs in QICU (unfortunately IBM still have ancient ICU there, usable, but more than 10 years old. See RFEs. I think they are limited by compilers).
my 2cciao
On Wednesday, February 25, 2026 at 02:03:37 PM GMT+1, Barbara Morris <bmorris@xxxxxxxxxx> wrote:
On 2026-02-25 5:41 a.m., Gad Miron wrote:
Hello Barbara
The (source) ds_NMMLIMP2.ITEMNAME looks OK - I'll check again though
Does %sbst function "knows" how to deal with a VARLEN data?
I'll check the CHARCOUNT thing.
%subst understands VARLEN data.
ds_NMWRKR.ITEMNAME8 = %subst(ds_NMMLIMP2.ITEMNAME :1 : 128) ;
But I don't think you should use %SUBST here.
The target field ds_NMWRKR.ITEMNAME8 has a length of 128 so if the
source field ds_NMMLIMP2.ITEMNAME has data that is longer than 128, it
would be automatically truncated.
If the source field ds_NMMLIMP2.ITEMNAME has data that is shorter than
128, the %SUBST would fail with RNX0100 ("Length or start position is
out of range for the string operation"). The start and length for %SUBST
refer to the current VARLEN length so 128 would not be allowed unless
the length of the source is 128 or greater.
Use an ordinary assignment instead:
ds_NMWRKR.ITEMNAME8 = ds_NMMLIMP2.ITEMNAME;
But this assignment without %SUBST would still be a problem if the data
in ds_NMMLIMP2.ITEMNAME is longer than 128 bytes and bytes 128 and 129
contain a 2-byte Hebrew character. ds_NMWRKR.ITEMNAME8 would only get
the first byte of that character (in CHARCOUNT STDCHARSIZE mode).
I did a little experiment with some CCSID 424 (Hebrew) data and all the
Hebrew characters were 2 bytes in UTF-8. It seems likely that would be
true of all Hebrew characters. If your source string has a mixture of
Hebrew (2-bytes per character) and non-Hebrew (1-byte per character), I
think you'd have a 50/50 chance of having invalid data in the target
string after the assignment in STDCHARSIZE mode.
Adding CHARCOUNTTYPES(*UTF8) and then doing the assignment in CHARCOUNT
NATURAL mode would allow RPG to handle the UTF-8 data correctly and
possibly only assign the first 127 bytes instead of splitting the last
character.
As an Amazon Associate we earn from qualifying purchases.