MIDRANGE dot COM Mailing List Archive



Home » MIDRANGE-L » February 2013

CPYFRMIMPF and Unicode - UTF-16 in particular



fixed

Hi

We are on 7.1, I don't know the PTF level. I'm looking at importing tab-delimited data from text files that are in little-endian UTF-16 (called Unicode in Notepad Save As). That is what Windows uses by default. Yes, there is an option in Notepad to save as big-endian UTF-16 (called Unicode big-endian), and I may be recommending that. Or just use ANSI. But mistakes do happen, and it'd be nice to have the program handle things.

I also tried saving the text file with TextPad, which also has the 2 UTF-16 "endian" options. There is a difference in how Notepad and TextPad do it - the former puts in the BOM at the start - the Byte Order Marker, which is x'FEFF' for big-endian (normal order, the one IBM i uses) and x'FFFE' for little-endian (bytes are in reverse order).

In any case, we want to be able to process Unicode text files. So I have 4 Unicode text files, tab-delimited.

*** It seems that CPYFRMIMPF converts individual bytes, not characters, as I'll show below. ***

CPY will do the conversion correctly for the big-endian ones, but for little-endian, the result is an empty stream file.

So double-clutch using CPY, then CPYFRMIMPF from the copy works.

2 questions:
1. Can CPYFRMIMPF actually do this correctly? you see the parameters I tried.
2. Can IBM i handle UTF-16LE (little-endian)? Is there a CCSID for that, like 1200 is for UTF-16BE (big-endian).

Thanks
Vern

uc-big-endian-notepad.txt (x'FEFF' at start of file)
uc-little-endian-notepad.txt (x'FFFE' at start of file)
uc-big-endian-textpad.txt (nothing at start of file)
uc-little-endian-textpad.txt (nothing at start of file)

The CPYFRMIMPF command I'm testing looks like this - sample result follows the first one, others are similar - a blank in clear text is actually x'00' - and note that x'FEFF' is converted to x'8EDF'.

CPYFRMIMPF FROMSTMF('uc-big-endian-notepad.txt') TOFILE(VERN/FLAT3000) MBROPT(*REPLACE) FROMCCSID(1200) TOCCSID(37) RCDDLM(*ALL) FLDDLM('~')
þÿ b r a n d s t o r e _
8D0809080908000A0A09090806
EF02090105040502030609050D

CPYFRMIMPF FROMSTMF('uc-little-endian-notepad.txt') TOFILE(VERN/FLAT3000) MBROPT(*REPLACE) FROMCCSID(1200) TOCCSID(37) RCDDLM(*ALL) FLDDLM('~')

CPYFRMIMPF FROMSTMF('uc-big-endian-textpad.txt') TOFILE(VERN/FLAT3000) MBROPT(*REPLACE) FROMCCSID(1200) TOCCSID(37) RCDDLM(*ALL) FLDDLM('~')

CPYFRMIMPF FROMSTMF('uc-little-endian-textpad.txt') TOFILE(VERN/FLAT3000) MBROPT(*REPLACE) FROMCCSID(1200) TOCCSID(37) RCDDLM(*ALL) FLDDLM('~')





Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2014 by MIDRANGE dot COM and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available here. If you have questions about this, please contact