Re: CPYFRMIMPF and Unicode - UTF-16 in particular -- MIDRANGE-L

I understand everything you are saying. But you forget what I said before - we do NOT know what the encoding should be. The source for these is external, and if they come from a SAVE in Windows, they are tagged as 1252. I did try an FTP transfer, and that came in with 819 - the default, I believe. I do know that a CCSID can be set for FTP, but I'm not going to train users to do that, or even another developer, we just need some way to know the content's encoding, and that is not available from the stream file itself, in this situation.

So long as the risks stated in several posts here - a possible mismatch of ANSI or UTF-8 code points - so long as those are acceptable, then I'm done.

If not acceptable, I will read in the data and determine the encoding, whether based on a BOM or on locating a space in either endian.

In order to change the CCSID of the stream file, I would have to do this anyhow - read it to guess its encoding. And if it IS little-endian - a very likely case - then I would use the transform API. And I'd use that, in any case, if the risks are not acceptable to us.

Bottom line - the work to determine encoding, and hence, the CCSID to change to, is needed in any case, whether using a CPY* command or the transform API. And because little-endian is very likely, and there is no CCSID for that, and no CPY* variant will process that correctly, the transform API is needed.

This all hinges on acceptance of a certain risk - which I have yet to hear. I will be asking and have stated it in a short paper describing the process.

If acceptable risk, using CPYFRMSTMF as described does a nice bytewise conversion to EBCDIC.

If not acceptable risk, I will read the data into RPG using IFS APIs, determine the encoding as best I can, then use the transform API to convert it to a usable encoding, then probably to EBCDIC.

I hope that clarifies things - again, I do understand what you are saying, it just doesn't easily apply to this situation, so far as I can tell.

Thanks
Vern

On 3/4/2013 6:05 PM, CRPence wrote:

On 04 Mar 2013 11:52, Vernon Hamberg wrote:

... CPYFRMSTMF appears to do a byte-by-byte conversion, as it seemed
CPYFRMIMPF did as well, just not all the time. So the typical UTF-16
representation for our situation is ANSI characters alternating with
x00's. ...

Ignoring any little-endian data:

Again, that issue [almost positively] is due to the incorrect
tagging; i.e. wrong CCSID. When the STMF is tagged with something other
than 1200 but its data is UTF-16BE, then the copy feature does not work.
So if the file is incorrectly tagged with CCSID-1252 or CCSID-819,
then CPYFRMSTMF does not know you lied, and tries to convert the data
based on that lie, then the effect will *appear to be* byte-by-byte
conversion. That is because the feature has no idea that the data is
two-byte characters, when the CCSID says they are not. Unlike the
CPYFRMIMPF however, the STMFCCSID [or STMFCODPAG on older releases] can
be used to override the STMF CCSID to 1200.

I can use the CPYFRMSTMF to copy [western characters as] UTF-16BE
data into EBCDIC without any issues on v5r3, so it should function
properly [the same] in any later release as well. I just have to ensure
the data in the STMF is really UTF-16BE and the STMF has the *CCSID
attribute of VALUE(1200). The feature recognizes the two-byte control
characters for EOR, so there would be no null character getting sent to
the flat file.