× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



Well, I started testing with CgiConvMode binary in the HTTP server. Entered a string containing € and some Chinese and found that it came in as urlencoded ASCII (or UTF-8). I then used cvtch to decode it to UTF-8, wrote it to stdout and everything appeared as it should be in the browser. So far, so good.

Because the urlencoded ASCII only contained 'non-problematic' values, I moved it to an EBCDIC string for decoding. Easy to do 'for I = 1 to length; char is %subst(string: I: 1) '. Then I wondered: would that work for UTF-8? So I tested it; that didn't work.

As for values in debug: see my first mail. EurUtf and ChrUtf both show € in the browser.

Joep Beckeringh

Op 30 jul. 2020 om 23:21 heeft Carel <coteijgeler@xxxxxxxxx> het volgende geschreven:

IIRC, CCSID 1208 (UTF-8) does not support BOM by default; that should be CCSID 1209 (and you guessed it: not supported on the system).

On the IFS you can start a stream file with the BOM (3 hex values) and than characters that will not be accepted in UTF-8 are acceptable.

But program variables are plain UTF-8, the BOM should have to be included to work.

How does it appear on the screen? in debug? Copied to the IFS (CCSID 1208)?

Just wondering.

Op 30-7-2020 om 23:04 schreef Joep Beckeringh via RPG400-L:
RPG nowadays can handle UCS-2 (data type ucs2; CCSID 13488), UTF-16 (data type ucs2; CCSID 1200) and UTF-8 (data type char; CCSID 1208). I was dealing with UTF-8.

The nice thing with UCS-2 (and single byte EBCDIC) is that every character takes a fixed number of bytes. In UTF-8 and -16 the number of bytes per character varies. I can imagine that that causes problems with the binary value in varying length fields.

Joep Beckeringh


Op 30 jul. 2020 om 22:26 heeft Bruce Vining <bruce.vining@xxxxxxxxx> het volgende geschreven:

I do not know how "exacting" the RPG documentation is, but I will point out
that UCS-2 <> UTF8.

On Thu, Jul 30, 2020 at 3:34 PM Vernon Hamberg <vhamberg@xxxxxxxxxxxxxxx>
wrote:

I think you are right, Carel - problem with UTF-8 is, that BOM is
optional. Just another thing that makes this game so much fun!

Vern

On 7/30/2020 11:09 AM, Carel wrote:
Think you are dealing with the Byte Order Marker in UTF-8, which is
active for characters that needs more than 1 byte to be representable.

But it is still 1 character

My thoughts on this.

Kind regards,

Carel Teijgeler

Op 30-7-2020 om 14:09 schreef Joep Beckeringh via RPG400-L:
Yes, I know. But in the case of ChrUtf the binary value contains 3,
while the string exists of 1 character (€) that uses 3 bytes. And the
documentation states that %len returns the number of characters.

Joep Beckeringh
Pantheon Automatisering BV


Op 30-7-2020 om 13:13 schreef Birgitta Hauser:
A varying length field is always preceeded with an (invisible) 2 (or
4) Byte Binary value which represents the total length (number of
characters) of the content in the varying length field.

Mit freundlichen Grüßen / Best regards

Birgitta Hauser


"Shoot for the moon, even if you miss, you'll land among the stars."
(Les Brown)
"If you think education is expensive, try ignorance." (Derek Bok)
"What is worse than training your staff and losing them? Not
training them and keeping them!"
„Train people well enough so they can leave, treat them well enough
so they don't want to.“ (Richard Branson)


-----Original Message-----
From: RPG400-L <rpg400-l-bounces@xxxxxxxxxxxxxxxxxx> On Behalf Of
Joep Beckeringh via RPG400-L
Sent: Donnerstag, 30. Juli 2020 12:25
To: rpg400-l@xxxxxxxxxxxxxxxxxx
Cc: Joep Beckeringh <joep.beckeringh@xxxxxxxxxx>
Subject: %Subst and %len with UTF-8 data

Hello All,

While experimenting with CCSID(*UTF8) I found the following:

- I have a character variable BufIn with CCSID(*UTF8); containing
(amongst others) a Euro-sign (€; x'E282AC' in UTF-8)
- I have a variable length (length 10) character variable ChrUtf with
CCSID(*UTF8)
- When I try to get the Euro-sign into ChrUtf with 'ChrUtf =
%subst(BufIn: pos: 1)' ChrUtf contains x'0001E2'.

I am inclined to consider this a bug.

- I have another variable EurUtf - char(3) ccsid(*utf8) - which I
initialized with X'E282AC' (€)
- After 'ChrUtf = EurUtf' ChrUtf contains x'0003E282AC' and
%len(ChrUtf) yields 3; the number of bytes in the string; not the
number of characters.

I am inclined to consider this a bug as well.

I guess we need Barbara Morris to shed some light on this.

Joep Beckeringh
Pantheon Automatisering BV

--
This is the RPG programming on IBM i (RPG400-L) mailing list To post
a message email: RPG400-L@xxxxxxxxxxxxxxxxxx To subscribe,
unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/rpg400-l
or email: RPG400-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives at
https://archive.midrange.com/rpg400-l.

Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription
related questions.

Help support midrange.com by shopping at amazon.com with our
affiliate link: https://amazon.midrange.com

--
This is the RPG programming on IBM i (RPG400-L) mailing list
To post a message email: RPG400-L@xxxxxxxxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/rpg400-l
or email: RPG400-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives
at https://archive.midrange.com/rpg400-l.

Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription related
questions.

Help support midrange.com by shopping at amazon.com with our affiliate
link: https://amazon.midrange.com


--
Thanks and Regards,
Bruce
931-505-1915
--
This is the RPG programming on IBM i (RPG400-L) mailing list
To post a message email: RPG400-L@xxxxxxxxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/rpg400-l
or email: RPG400-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives
at https://archive.midrange.com/rpg400-l.

Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription related questions.

Help support midrange.com by shopping at amazon.com with our affiliate link: https://amazon.midrange.com

--
This is the RPG programming on IBM i (RPG400-L) mailing list
To post a message email: RPG400-L@xxxxxxxxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/rpg400-l
or email: RPG400-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives
at https://archive.midrange.com/rpg400-l.

Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription related questions.

Help support midrange.com by shopping at amazon.com with our affiliate link: https://amazon.midrange.com


As an Amazon Associate we earn from qualifying purchases.

This thread ...

Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.