Re: Importing UTF-8 multibytes file -- MIDRANGE-L

@Patrik
Yes, 655535 was a type, definitely!

@Stephen, @Vern
I was surprised to see how little information there's outside about this
topic, too. So, your information is gold to me.

The IBM system is settled in the USA, running V7R3, so system CCSID is 37.
The process I am working on is currently built on these steps:
- e-commerce system put orders' data files into the IFS: they are UTF-8
files
- these files get into a DB2 65535-CCSID file via FTP commands (login, type
e, mode block, get <file>)
- later on, those files are read and their content used to populate other
DB2 37-CCSID files
- finally, some data are read from these DB2 37-CCSID files and FTPd to a
final distributor
Obviously, the final files lack some characters, especially when addresses
are from different parts of the planet.

Firstly, we were asked to produce the final files in UTF-8, but soon we
discovered this wasn't enough to solve the distributor's problems: data
must be imported the right way to be able to export them, one way or
another, the right way too.

Sure, I know that something must change in the way data are stored during
their life in the IBM system, but my first question is how to read them
from the IFS and store them in DB2 files.

UTF-8 files to be imported are multiple bytes files, meaning each
represented character can fit 1, 2, 3, or 4 bytes.

Before reading your answers I was already trying to read the file using
SQL, like this:
select "LINE" from
table(QSYS2.IFS_READ_UTF8('/home/MLSTOPPA/MyUtf8File.TXT'));
then, I would have used %ucs2 BIF to read every single character
represented (yes Vern!) to compose a converted string to write to the DB2
65535-CCSID file, but what I get from your comments is that will never
happen, will it?

In other words, are you telling me I can't, under any circumstances, load
these data into a DB2 65535-CCSID file? And that a file with a specific
code page have to be used instead?

If a DB2 1208-CCSID file, or a file with some 1208-CCSID fields, is to be
used instead, how can I read the data needed to complete the order? Always
converting the data in my RPG programs?

Thank you very much
Lucia

Il giorno ven 4 feb 2022 alle ore 15:30 Vern Hamberg via MIDRANGE-L <
midrange-l@xxxxxxxxxxxxxxxxxx> ha scritto:

+1 to this

UTF-8 can use 1, 2, 3, or 4 bytes to represent a character, even emojis
- there's no way to copy it back and forth between 1208 and 65535 or 37
or whatever EBCDIC you use in your country.

Even with 65535, I believe everything is handled using single bytes, so
it won't work for copying both ways. It would be interesting to see, of
course, whether the raw bytes are copied over. But presenting the data
out of 65535, IF all the bytes are copied, would be impossible, due to
the different lengths of the character representation. Or do you want to
write your own program to process UTF-8? I'm smiling as I suggest that
- no way I want to.

There are things, like emojis, that will not convert into any of the
EBCDIC character sets. We use SQL's XML/JSON functions to import the
data - anything it doesn't recognize is converted to X'3F', I believe.
Does RPG recognize UTF-8 now? I haven't checked for awhile. It has
recognized UTF-16 and UCS-2.

Cheers
Vern

On 2/4/2022 8:12 AM, Stephen Landess wrote:

Maria -

I spent the last 15 years working in a multinational JDE shop with 54

different environments comprising most of the major languages and character
sets in the world.

I was surprised how little information was available about character

conversions in forums such as Midrange-L and Midrange-RPG when I first
started working there in 2006. I finally found Scott Klement's web site
and found a wealth of information from him.

If you have multinational character set data in the IFS file (i.e., data

from different countries which have varying character sets), then the
safest way to handle it is to create a new file on the IBM i with
CCSID(1208) {UTF-8}, CCSID(1200) {UFT-16}, or CCSID(13488) {UCS2} and use
CPYFRMIMPF to copy the data from the IFS file to the IBM i file using the
appropriate from and to CCSID's and use the data in the new file in your
applications. This may require using *UCS fields in RPG programs.

If the IFS file is data from a particular country, then the CCSID 1208

data be converted to EBCDIC into your current file by using the appropriate
EBCDIC CCSID as the TOCCSID() in CPYFRMIMPF, and the OS will convert the
data from 1208 to EBCDIC. However, when the file is defined using
CCSID(65535) you'll need to set the job CCSID to match the EBCDIC CCSID of
the data in order to use it...

Feel free to call me if you need further information.

Regards,
Steve Landess
512-289-0387

Maria wrote:

Hi all,
Hope you are all fine!
For a new customer of mine, I need to import into IBM i running V7R3 a
UTF-8 multibyte file. So, the UTF-8 multibyte file is on IFS and has

CCSID set to 1208, whereas the DB2 flat file is set to CCSID = 655535.

The UTF-8 file is multibyte because it may contain worldwide addresses.

No matter what command do I use, be it FTP or CPYFRMIMPF,
the hexadecimal correspondence for a multibyte character (i.e. C5A0)
is just one byte (in our example is 3F);
right after, if I FTP the same DB2 flat file back to IFS, using TYPE C

1208,

the resulting file is different from the original one:
while in the original file there is a multibyte (i.e. C5A0) now there

is a single byte (1A).

In your opinion and experience, is there a way to import and then

export a

UTF-8 file to and from the Power system so that the resulting file is

the same as the original one?

Should I really be obliged to read the UTF-8 file from the IFS byte per

byte and make a conversion, sort of?

I already made an unsuccessful search in the mailing list and I resolve

to ask, because I am pretty sure this is a common issue.

--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives
at https://archive.midrange.com/midrange-l.

Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription related
questions.

Help support midrange.com by shopping at amazon.com with our affiliate
link: https://amazon.midrange.com