|
On 2020-07-30 8:09 a.m., Joep Beckeringh via RPG400-L wrote:
Yes, I know. But in the case of ChrUtf the binary value contains 3, while the string exists of 1 character (€) that uses 3 bytes. And the documentation states that %len returns the number of characters.
For UTF-8, %LEN returns the number of bytes. String operations like %SUBST work on bytes too.
For UCS-2, %LEN returns the number of double bytes, and string operations work on double bytes. So you might encounter the same issue with characters that have 4 bytes. But it is much much more rare to encounter this situation for UCS-2 data than for UTF-8 data.
This issue has always existed in RPG for mixed SBCS/DBCS data, where the number of bytes is not necessarily equal to the number of characters.
If RPG does change how it handles string functions, truncation, %LEN etc for data where characters are not all the same size, it will need some new syntax to indicate that it should behave in the new way. Possible a compiler directive, or new built-in functions, or some new parameter for existing built-in functions.
There is not currently any RPG RFE for this.
As an Amazon Associate we earn from qualifying purchases.
This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].
Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.