Re: %Subst and %len with UTF-8 data -- RPG400-L

OK, but then the documentation is, let's say, not very precise :-)

The reference says for %len:
- 'For character, graphic, or UCS-2 expressions the value returned is the number of characters in the value
of the expression.'
- 'For all other data types, the value returned is the number of bytes of the value.'

That implies that UTF-8 and UTF-16 are considered to be 'other data types'. But I don't find anything that says that CCSID(*UTF8) and CCSID(*UTF16) have implications for the data type.

Anyway, I'll take the hint for an RFE :-)

Joep Beckeringh

Op 5-8-2020 om 00:23 schreef Barbara Morris:

On 2020-07-30 8:09 a.m., Joep Beckeringh via RPG400-L wrote:

Yes, I know. But in the case of ChrUtf the binary value contains 3, while the string exists of 1 character (€) that uses 3 bytes. And the documentation states that %len returns the number of characters.

For UTF-8, %LEN returns the number of bytes. String operations like %SUBST work on bytes too.

For UCS-2, %LEN returns the number of double bytes, and string operations work on double bytes. So you might encounter the same issue with characters that have 4 bytes. But it is much much more rare to encounter this situation for UCS-2 data than for UTF-8 data.

This issue has always existed in RPG for mixed SBCS/DBCS data, where the number of bytes is not necessarily equal to the number of characters.

If RPG does change how it handles string functions, truncation, %LEN etc for data where characters are not all the same size, it will need some new syntax to indicate that it should behave in the new way. Possible a compiler directive, or new built-in functions, or some new parameter for existing built-in functions.

There is not currently any RPG RFE for this.

This mailing list archive is Copyright 1997-2026 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.