× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



I'm undoubtedly missing something, but I still don't understand your
concern.

As you indicated, UTF-16 can require 1 or 2 16-bit code point units
representing 2 or 4 bytes of storage. UTF-8, in the same manner, can
require 1, 2, 3, or 4 8-bit code point units representing 1, 2, 3, or 4
bytes of storage. Both encodings can be used to represent the full range of
characters in the currently defined Unicode standard. Prior to 2003 UTF-8
could, in theory, go beyond 4 8-bit code points but that capability was
never used and was formally removed from the standard in 2003.

The default for RPG is UCS-2 (CCSID 13488) which is, unfortunately, based
on an earlier form of Unicode where a fixed-width 16-bit code point was
used (no surrogate pair support for that second 16-bit code point). On the
RPG H-spec you can however specify CCSID(*UCS2 :1200) which tells RPG that
UTF-16 is to be used as the default Unicode CCSID. It's unfortunate that
the RPG keyword is UCS2 when it's really more than UCS-2, but that's what
happens with keywords sometimes when the world changes. Barbara -- am I
missing something about the RPG implementation of CCSID 1200 support?

UTF-8 is typically the preferred encoding for web applications, but that's
largely due to its ASCII transparency and not due to support for a larger
character set (when compared to UTF-16).



On Tue, Jan 14, 2014 at 7:37 AM, Henrik Rützou <hr@xxxxxxxxxxxx> wrote:

Bruce

UTF-16 is a 2 OR 4 bytes unicode encoding. Unicode has a total of 1,114,112
code
points.

In other words you sometime needs 2 DBCS characters to create a single
Unicode
code point so 10 Unicode characters may take up 15 DBCS characters.

Does RPGLE support that - the answer is NO.




On Tue, Jan 14, 2014 at 12:12 AM, Bruce Vining <bvining@xxxxxxxxxxxxxxx
wrote:

I do not understand this second note at all. CCSID 1200 gives you the
full
UTF-16 range and is what I generally use.

A SBCS job environment is going to limit you to 192 discrete EBCDIC
characters if you ask for the Unicode data to be converted to the job
CCSID. But it's your code (someplace) that's asking for that conversion
--
so don't ask.

If transforms are needed there are APIs such as iconv() and
QlgTransformUCSData (with support for UTF-8, UTF-16, and UTF-32). Either
of
these APIs (plus others) could be easily wrapped within a user function
and
hidden from the application developer in terms of their implementation.


On Mon, Jan 13, 2014 at 8:38 AM, Henrik Rützou <hr@xxxxxxxxxxxx> wrote:

Joep,

CCSID 1200 or 13488 doesn't basically give you full unicode support in
RPGLE
unless you base or result is UTF-8 and you use binary iconv to convert
between
the formats.

Iconv will do correct conversion of large characters (3-4 bytes UTF-8)
into 2*2 bytes UTF-16 CCSID 1200) since it is a "calculated" conversion
that isn't based on a translation table.

In other words you can calculate the hex conversion of the full unicode
span
between UTF-8, UTF-16 and UTF-32.

The problem is that these string conversions isn't natively supported
by
RPGLE
as a field type, you have to use raw storage manipulation with iconv to
achive it.

Basically UTF-8 is a one byte string that shares x'00'-x'7F' with ASCII
but
it
would be nice just to be able to move ingoing or outgoing UTF-8
directly
to/from
a field type without conversions.

UTF-8 can be converted to SBCS EBCDIC in two ways, on a "normal" iconv
CCSID 1208>37 that only will support the 256 characters in the SBCS
EBCDIC
CCSID or on byte level.

At the moment I'm working on a replacement of powerEXT Core, a CGIDEV2
SBCS hybrid where a new middleware will have full Unicode,SBCS and DBCS
support.

My problem is that neither SBCS or DBCS "original" has that support in
DB2
fields - unless I have overseen something.






On Mon, Jan 13, 2014 at 2:52 PM, <
j.beckeringh@xxxxxxxxxxxxxxxxxxxxxxxxxx
wrote:

Henrik,

What exactly are you looking for? Do you want to use Unicode in RPG
or
do
you specifically want to use UTF-8 encoding in RPG? Using Unicode is
simple enough through UCS-2 encoding (datatype C; CCSID 1200 or 13488
as
Bruce mentioned; implicit conversion by assignment or explicit
conversion
by %ucs2 and %char).

Joep Beckeringh



Henrik Rützou <hr@xxxxxxxxxxxx>

Re: DB2 UTF-8 fields used in RPGLE

Unless I have overlooked something the RPGLE UTF-8 field support is
more or less useless since it in reality only supports characters
in
the
jobs SBCS EBCDIC CCSID :-(

It would be far better that the DB just passed the data "as is
bytes"
so it could be passed to either a the jobs SBCS EBCDIC field or to
a DBCS field by using a %BIF.

Why on earth didn't IBM not just copy the DBCS support to UTF-8
support? Maybe Barbara Morris can answer that question?
--
This is the RPG programming on the IBM i (AS/400 and iSeries)
(RPG400-L)
mailing list
To post a message email: RPG400-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/rpg400-l
or email: RPG400-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/rpg400-l.




--
Regards,
Henrik Rützou

http://powerEXT.com <http://powerext.com/>
--
This is the RPG programming on the IBM i (AS/400 and iSeries)
(RPG400-L)
mailing list
To post a message email: RPG400-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/rpg400-l
or email: RPG400-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/rpg400-l.




--
Regards,
Bruce
www.brucevining.com
www.powercl.com
--
This is the RPG programming on the IBM i (AS/400 and iSeries) (RPG400-L)
mailing list
To post a message email: RPG400-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/rpg400-l
or email: RPG400-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/rpg400-l.




--
Regards,
Henrik Rützou

http://powerEXT.com <http://powerext.com/>
--
This is the RPG programming on the IBM i (AS/400 and iSeries) (RPG400-L)
mailing list
To post a message email: RPG400-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/rpg400-l
or email: RPG400-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/rpg400-l.





As an Amazon Associate we earn from qualifying purchases.

This thread ...

Follow-Ups:
Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.