Re: Double byte from EBCDIC to ASCII. (Tom L. Deskevich) -- MIDRANGE-L

On 17-Oct-2016 11:33 -0500, Tom L. Deskevich wrote:

I have to say I do not know the double byte character set.

DBCS is little different than SBCS, when the topic is EBCDIC CCSIDs; they are all quite nearly language-specific. Just take a quick peek at a list of [Coded Character Set IDentifiers] (http://www.ibm.com/software/globalization/ccsid/ccsid_registered.html) for what each is /named/, to get an idea.

But that is the CCSID (65535) we use for file creation is our
standard for any language that uses symbols and such for the
language.

As an /installed/ file, I recall there was a special process [as part of database restore] whereby the file will be implicitly tagged with the CCSID of the primary language of the system. Is the noted file actually a file built-for, but has not been installed-as, a packaged program product?

As the actual final-form of the file, as intended to be tagged with *HEX for that column, then I would see little reason to operate differently for DBCS than SBCS other than for additional storage; i.e. if other than Latin-like languages are represented with CCSID(65535), then consistency would have the Latin-like languages handled the same way.

I am creating the extract files, so I have some flexibility.

I think the application probably is best reviewed for the topics of /globalization/; e.g. some topics in the KC:
(https://www.ibm.com/support/knowledgecenter/search/globalization?scope=ssw_ibm_i_73)

so I should use CCSID 1209 for UTF8?

CCSID 1208 is for UTF8. For /global/ data, i.e. not specific to a language, a very encompassing CCSID is required for tagging the data in a column, or the data must be tagged with a CCSID in an alternate manner; e.g. as a stored value in another column, to identify what is the data in the otherwise BINARY/undescribed column of data for the same row.

I am not saying CCSID(*HEX) can not be used in any particular application, just that, as column-data that any utilities might reference, whereby such utilities depend on properly tagged data, are not going to function in a generally-desirable manner. That is similar to how using program-described files in generic utilities [like FTP] are not always going to do nice things to the non-text [binary] data when using text-mode transfer between encoding schemes [or even between code pages within the same encoding scheme], and similarly for the use of a binary/image-mode transfer, the references to the transferred text-data in another encoding scheme are going to show gibberish despite the portion that is binary data transporting without any corruption.

The CPYTOIMPF ignores all CCSID codes except 65535, so I guess I need
another utility to do this?

The Copy To Import File (CPYTOIMPF) essentially ignores only CCSID 65535 [aka *HEX]; i.e. does not effect translation, only when the column CCSID attribute [or data type] suggests not to do so. IOW, as an effectively /generic/ utility, the feature must know the column CCSID to effect desirable results; CCSID(*HEX) is a non-CCSID, as an indication that *no CCSID translation should occur*.

The client access method just doubles up the record length, to
produce the double byte information the way I read it.

Use of the "Client Access data transfer" and the "forced translation" option has to /assume/ then, somehow, what is the CCSID of the EBCDIC data that claims there should be no CCSID translation. If the data is Japanese character data but the user requesting the transfer is USEnglish with the default mixed CCSID of 937, the /force translate/ option will convert from 937->ASCII. But the actual data presumably was stored as 5026, despite being tagged as *HEX, so rather than just garbage, probable garbage plus conversion errors is the effect on transfer of the data.