|
RPG nowadays can handle UCS-2 (data type ucs2; CCSID 13488), UTF-16 (data type ucs2; CCSID 1200) and UTF-8 (data type char; CCSID 1208). I was dealing with UTF-8.
The nice thing with UCS-2 (and single byte EBCDIC) is that every character takes a fixed number of bytes. In UTF-8 and -16 the number of bytes per character varies. I can imagine that that causes problems with the binary value in varying length fields.
Joep Beckeringh
Op 30 jul. 2020 om 22:26 heeft Bruce Vining <bruce.vining@xxxxxxxxx> het volgende geschreven:
I do not know how "exacting" the RPG documentation is, but I will point out
that UCS-2 <> UTF8.
On Thu, Jul 30, 2020 at 3:34 PM Vernon Hamberg <vhamberg@xxxxxxxxxxxxxxx>
wrote:
I think you are right, Carel - problem with UTF-8 is, that BOM is
optional. Just another thing that makes this game so much fun!
Vern
On 7/30/2020 11:09 AM, Carel wrote:--
Think you are dealing with the Byte Order Marker in UTF-8, which is
active for characters that needs more than 1 byte to be representable.
But it is still 1 character
My thoughts on this.
Kind regards,
Carel Teijgeler
Op 30-7-2020 om 14:09 schreef Joep Beckeringh via RPG400-L:
Yes, I know. But in the case of ChrUtf the binary value contains 3,
while the string exists of 1 character (€) that uses 3 bytes. And the
documentation states that %len returns the number of characters.
Joep Beckeringh
Pantheon Automatisering BV
Op 30-7-2020 om 13:13 schreef Birgitta Hauser:
A varying length field is always preceeded with an (invisible) 2 (or
4) Byte Binary value which represents the total length (number of
characters) of the content in the varying length field.
Mit freundlichen Grüßen / Best regards
Birgitta Hauser
"Shoot for the moon, even if you miss, you'll land among the stars."
(Les Brown)
"If you think education is expensive, try ignorance." (Derek Bok)
"What is worse than training your staff and losing them? Not
training them and keeping them!"
„Train people well enough so they can leave, treat them well enough
so they don't want to.“ (Richard Branson)
-----Original Message-----
From: RPG400-L <rpg400-l-bounces@xxxxxxxxxxxxxxxxxx> On Behalf Of
Joep Beckeringh via RPG400-L
Sent: Donnerstag, 30. Juli 2020 12:25
To: rpg400-l@xxxxxxxxxxxxxxxxxx
Cc: Joep Beckeringh <joep.beckeringh@xxxxxxxxxx>
Subject: %Subst and %len with UTF-8 data
Hello All,
While experimenting with CCSID(*UTF8) I found the following:
- I have a character variable BufIn with CCSID(*UTF8); containing
(amongst others) a Euro-sign (€; x'E282AC' in UTF-8)
- I have a variable length (length 10) character variable ChrUtf with
CCSID(*UTF8)
- When I try to get the Euro-sign into ChrUtf with 'ChrUtf =
%subst(BufIn: pos: 1)' ChrUtf contains x'0001E2'.
I am inclined to consider this a bug.
- I have another variable EurUtf - char(3) ccsid(*utf8) - which I
initialized with X'E282AC' (€)
- After 'ChrUtf = EurUtf' ChrUtf contains x'0003E282AC' and
%len(ChrUtf) yields 3; the number of bytes in the string; not the
number of characters.
I am inclined to consider this a bug as well.
I guess we need Barbara Morris to shed some light on this.
Joep Beckeringh
Pantheon Automatisering BV
--
This is the RPG programming on IBM i (RPG400-L) mailing list To post
a message email: RPG400-L@xxxxxxxxxxxxxxxxxx To subscribe,
unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/rpg400-l
or email: RPG400-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives at
https://archive.midrange.com/rpg400-l.
Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription
related questions.
Help support midrange.com by shopping at amazon.com with our
affiliate link: https://amazon.midrange.com
This is the RPG programming on IBM i (RPG400-L) mailing list
To post a message email: RPG400-L@xxxxxxxxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/rpg400-l
or email: RPG400-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives
at https://archive.midrange.com/rpg400-l.
Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription related
questions.
Help support midrange.com by shopping at amazon.com with our affiliate
link: https://amazon.midrange.com
--
Thanks and Regards,
Bruce
931-505-1915
--
This is the RPG programming on IBM i (RPG400-L) mailing list
To post a message email: RPG400-L@xxxxxxxxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/rpg400-l
or email: RPG400-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives
at https://archive.midrange.com/rpg400-l.
Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription related questions.
Help support midrange.com by shopping at amazon.com with our affiliate link: https://amazon.midrange.com
As an Amazon Associate we earn from qualifying purchases.
This mailing list archive is Copyright 1997-2025 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].
Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.