RE: Chinese characters in an interface file -- MIDRANGE-L

I will agree that I have UTF-8 data after some of the research.

Right now they are Chinese but I know if I code for Chinese only, Something
else will come through and bite me.

Normally, I would try to keep the data instead of killing it but that is
what I was told to do. The weirder part is that the fields that they are
having issues with are no longer used in the file on the IBM i (there is a
separate interface to the other systems) so why I have to keep the
non-Chinese values and blank out the Chinese values is beyond me. It would
be much easier to blank the entire column.

I am playing with some of the ideas given along with a few others.

-----Original Message-----
From: MIDRANGE-L <midrange-l-bounces@xxxxxxxxxxxxxxxxxx> On Behalf Of
Barbara Morris
Sent: Wednesday, May 13, 2020 12:14 PM
To: midrange-l@xxxxxxxxxxxxxxxxxx
Subject: Re: Chinese characters in an interface file

On 2020-05-13 11:22 a.m., Vernon Hamberg wrote:

Hi

It is possible that the CSV file is in UTF-8 -- the hex 31 you
describe is what the number 1 would be in UTF-8.
...

If it IS UTF-8, you might try marking the file with CCSID 1208 and do
a text transfer, not a binary transfer.

Or mark the field in your PF as 1208 CCSID, do the binary transfer,
then see what RPG does with it. In RPG, do and EVAL from the UTF-8
field to a regular 37 EBCDIC field.
...

I agree with Vern that it is likely that you have UTF-8 data.

If you get the data into a UTF-8 RPG field and assign it to a CCSID 37 using
RPG, the characters that can't convert to CCSID(37) will be x'3F'.
You could then scan for x'3F' and blank the field if you find one.

Or, if you're specifically only concerned with Chinese, you could assign to
a CCSID(937) field, and then scan for x'0E' to see if there are any Chinese
characters.

But ... following the Never Lose Data principle, I think it would be a
zillion times better to save the UTF-8 data as is and not try to convert it
to CCSID 37, and especially not blank the field if it has Chinese data.

Just guessing, but assuming you do need this in CCSID 37, wouldn't it be
better to just replace the unconvertable characters with say '?' than to
blank out the entire field?

--
Barbara

--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx To subscribe,
unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives at
https://archive.midrange.com/midrange-l.

Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription related
questions.

Help support midrange.com by shopping at amazon.com with our affiliate link:
https://amazon.midrange.com