Re: can't define hidden utf-16 field in a subfile -- RPG400-L

On 16-Mar-2015 07:02 -0500, Marco Benetti wrote:

On 2015-03-09 19:13 GMT+01:00 Booth Martin wrote:

On 3/9/2015 5:13 AM, Marco Benetti wrote:

On 2015-03-06 16:36 GMT+01:00 Wilson, Jonathan wrote:

On 05-Mar-2015 16:43 -0600, Marco Benetti wrote:

If I set CCSID at file level my display file is created. But
if I not set CCSID attribute at field level the field will
not be UTF-16 but graphics (once compiled my RPGLE program,
the field will have G and not C datatype).
So that not solve my problem.
I need a UTF-16 hidden field.

As the field is hidden, the CCSID is not required because no
translation is performed. <<SNIP>>

I'm not sure you are right.
I have a db field that is UTF-16.
I put it in my hidden field. In the list of my RPGLE program the
hidden field have datatype A, not C, so I think there is a
conversion. The eval operation does the conversion.

Why not eliminate the step? Why is it required that there be a
hidden field on the display? Load the field into an single-field
array indexed by the subfile record number and be done with the
issue.

It seems the only way. <<SNIP>>

A couple comments, FWiW only; my reply is not meant to elicit followup, so any /questions/ are merely rhetorical:

• Regarding the use of the file-level kwd being allowed vs field-level being disallowed, I noted also, that using the record-level gave no error. That usage could limit the scope of the effect.?

• As to the effect of the hidden field being type G vs C when using the file-level CCSID kwd, AFaIK that is *correct* for when using UTF16; i.e. both CCSID(1200) and CCSID(13488) as variants of UTF-16 must be G, it is UTF8 that remains /character/ type. See doc links included below.

• While the error message is clearly occurring, that there is no explicit doc noting the restriction at the field-level and that there is apparent support at both the record\format-level might be sufficient to posit the receipt of the error is a defect; i.e. report the error as probable defect [if not with the code, then argue the error is with the docs], or submit a comment to the docs (to suggest the lack of notation about the restriction) for which like a defect-report a response from IBM would conclude the issue more definitively than just /giving up/.

Please see the following docs references [links followed by quoted snippets] which support my comments about the CCSID. Notably, if there is a "DB field that is UTF-16" then that must be data type Graphic with either CCSID(1200) for UTF16 or CCSID(13488) for UCS2 for which the "system treats both CCSID 13488 and CCSID 1200 as UTF-16 encodings.":

<http://www.ibm.com/support/knowledgecenter/api/content/ssw_ibm_i_71/rzakb/ucs2ap.htm>
_Unicode considerations for database files_
"...

These transformation formats (encoding forms) of Unicode are supported with physical and logical file DDS:

• UTF-8 is an 8-bit encoding form designed for ease of use with existing ASCII-based systems. UTF-8 data is stored in character data types. The CCSID value for data in UTF-8 format is 1208.

A UTF-8 code unit is 1 byte in length. A UTF-8 character can be 1, 2, 3, or 4 code units in length. A UTF-8 data string can contain any character, including surrogates and combining characters.

• UTF-16 is a 16-bit encoding form designed to provide code values for over a million characters, and a superset of UCS-2. UTF-16 data is stored in graphic data types. The CCSID value for data in UTF-16 format is 1200.

A UTF-16 code unit is 2 bytes in length. A UTF-16 character can be 1 or 2 code units (2 or 4 bytes) in length. A UTF-16 data string can contain any character, including UTF-16 surrogates and combining characters.

• UCS-2 is the Universal Character Set coded in 2 octets, which means that characters are represented in 16-bits per character. UCS-2 data is stored in graphic data types. The CCSID value for data in UCS-2 format is 13488.

UCS-2 is a subset of UTF-16, and can no longer support all of the characters defined by Unicode. UCS-2 is identical to UTF-16, except that UTF-16 also supports combining characters and surrogates. If you do not need support for combining characters and surrogates, then you can choose to use the UCS-2 type, because there is more database functionality available for it.

Note: In this topic, references to UTF-16 imply UCS-2 as well.
..."

<http://www.ibm.com/support/knowledgecenter/api/content/ssw_ibm_i_71/nls/rbagsucs2.htm>
_UCS-2 and its relationship to Unicode_ (UTF-16)
"The UCS-2 standard, an early version of Unicode, is limited to 65 535 characters. However, the data processing industry needs over 94 000 characters; the UCS-2 standard has been superseded by the Unicode UTF-16 standard.

The IBM® i operating system supports CCSID 13488, defined as UCS-2, and CCSID 1200, defined as UTF-16. The system treats both CCSID 13488 and CCSID 1200 as UTF-16 encodings.

Using either scheme, you will have the same results for almost all system operations. However, certain SQL functions that operate on a character boundary defined by the SQL standard can produce different results. For instance, the SQL functions of CHARACTER, LENGTH, POSITION, and SUBSTRING distinguish UTF-16 and UCS-2, and therefore you get different results. See the SQL reference for more information about these functions.
..."

<http://www.ibm.com/support/knowledgecenter/api/content/ssw_ibm_i_71/nls/rbagsutf16.htm>
_UTF-16_

Like the first of those doc links above, the one regarding database files, there is a similar topic for Display Files:

<http://www.ibm.com/support/knowledgecenter/api/content/ssw_ibm_i_71/rzakc/dspfil.htm>
_Unicode considerations for display files_
"Unicode is a universal encoding scheme for written characters and text that enables the exchange of data internationally. Two transformation formats, UTF_16 and UCS_2, of Unicode are supported with DDS.

A Unicode field in a display file can contain UCS-2 or UTF-16 data. Unicode data is composed of code units, which represent the minimal byte combination that can represent a unit of text.

There are two transformation formats (encoding forms) of Unicode that are supported with DDS:

• UTF-16 is a 16-bit encoding form designed to provide code values for over a million characters and a superset of Unicode. UTF-16 data is stored in graphic data types. The CCSID value for data in UTF-16 format is 1200.

A UTF-16 code unit is 2 bytes in length. A UTF-16 character can be 1 or 2 code units (2 or 4 bytes) in length. A UTF-16 data string can contain any character including UTF-16 surrogates and combining characters.

• UCS-2 is the Universal Character Set coded in 2 octets, which means that characters are represented in 16 bits per character. One code unit is used in this topic to describe the size of a UCS-2 character. UCS-2 data is stored in graphic data types. The CCSID value for data in UCS-2 format is 13488.

UCS-2 is a subset of UTF-16 and can no longer support all of the characters defined by Unicode. UCS-2 is identical to UTF-16 except that UTF-16 also supports the combining of characters and surrogates. If you do not need support for the combining of characters and surrogates, you can choose to continue to use the UCS-2 format.

Unicode data is not supported on display devices that currently support the 5250 data stream. Therefore, conversions between the Unicode data and EBCDIC are necessary during input and output. On output, the Unicode data is converted to the CCSID of the device. On input, the data is converted from the device CCSID to the Unicode CCSID.
..."