Re: CSV file problems -- MIDRANGE-L

On 21/05/2010, at 12:13 AM, Pat Barber wrote:

Using Sheri's suggestion, I used 37 as the ccsid and did the copy
to the IFS and that created a file with a ccsid 437. I then opened
that file directly with Excel and it opened just fine. I also tried it
with Notepad and it opens it with individual records.

37 is the host EBCDIC CCISD for US English (and a few other variants)

437 is the ASCII equivalent when using the IBM PC Data encoding scheme.

Thus 37 converts happily into 437 and vice versa.

This entire ccsid thing is a mystery to me. I thought that was related
to different language versions of the OS.

Not directly. The CCSID is a number that indicates a character set (CS) and code page (CP). Character set is the range of characters (e.g., a-z, etc.) supported. Code page is the range of code points used to represent those characters. Thus one can have the same character set implemented with different code points giving the same CS with different CP values, in turn giving a different CCSID. One can also have the same code point represent different characters in different CCSIDs. This is why a $ sign appears as a £ (Pound Sterling for those of you with retarded e-mail clients) when viewed on a UK system without proper CCSID settings.

US English: CCSID=37, CS=697, CP=37
UK English: CCSID=285, CS=697, CP=285

You can see they support the same character set but use different code points to represent some of those characters. The $ sign in US English uses the same code point as the £ sign in UK English thus without correct CCSID conversion (e.g., because you are still running CCSID 65535 [although it's a bit more complicated than that due to job default CCSID based on language ID]) the $ sign is interpreted and displayed as a £ sign.

The invariant character set uses the same code points for the same characters in most CCSIDs.

Some CCSID values are the same as their corresponding CP, some are different, and some CCSIDs have multiple CS and CP values. The similarity of CCSID and CP values gives rise to the erroneous idea that they mean the same thing. CCSIDs with multiple CS and CP values are generally multi-byte encodings with support for Asian languages (e.g, Japanese, Simplified Chinese, etc.) One set of CS/CP pairs is for SBCS data and another pair is for DBCS data.

Because different languages require different characters sets to represent the full range of characters used by a given language (e.g, accented characters for so-called Romance languages or Cyrillic for Russian, etc.) there is a relationship between CCSID and language but individually they are quite separate things.

I just don't understand the cpytoimpf and it's results.

I just ran a test using this command:

CPYTOIMPF FROMFILE(TCS/SNPREM) TOSTMF('/home/patrick/snpremtst.csv')
RCDDLM(*CRLF)

That creates a file with ccsid of 37.

That's expected behaviour when using the default value for the STMFCODPAG keyword. See the help text for STMFCODPAG:

"If the stream file does not exist, the code page equivalent of the source database file CCSID is used and associated with the stream file."

Thus the source file is in CCSID 37, the associated CP is 37, the stream file does not exist and so is created with a CP of 37 (or the equivalent CCSID on releases that support CCSID for stream files).

This explains why one common solution to the problem is to pre-create the stream file with an appropriate CCSID. In that case the copy will convert from the source file CCSID to the target stream file CCSID.

That's readable using wrklnk but is complete garbage opened in WordPad.

Yep, because it's EBCDIC data so it's legible in option 2 of WRKLNK but will be junk in WordPad or Notepad because they are expecting ASCII data. Note that if you transferred the file to the PC via FTP in text mode it would be converted to ASCII and be legible. Also if you had NetServer set up correctly it would convert to ASCII and again the data would be legible via a share.

but if I do this:

CPYTOIMPF FROMFILE(TCS/SNPREM) TOSTMF('/home/patrick/snprem437.csv')
MBROPT(*REPLACE) FROMCCSID(37) STMFCODPAG(*PCASCII) RCDDLM(*CRLF)

It is readable in WordPad and Excel and the ccsid in the IFS is 1252 ???

That's because 1252 is the associated ASCII CCSID for EBCDIC 37 in the ISO-8 encoding scheme (the so-called MS-Windows encoding). *PCASCII uses the x'4105' encoding scheme (ISO-8). *STDASCII uses the x'2100' encoding scheme (IBM PC Data). One might more reasonably expect *STDASCII to use the x'4100' encoding scheme which would give a CCSID of 819.

(I've always thought these special values were screwed up and should be:
*PCASCII = x'2100' IBM PC Data e.g., 437
*STDASCII = x'4100' e.g., 819
*WINASCII = x'4105' MS-Windows e.g., 1252

Perhaps *WINASCII should more properly be *ISO8. There should be a *MACASCII too which is also encoding x'4105' but uses different CS and CP values from the MS-Windows variants.)

Is there a correct combination to use if the file is to be send to
another site ???

Obviously you should use the CCSID appropriate for the other site :)

In practice anything will work as long as the environment (e.g., FTP, NetServer, etc.) is set up to correctly convert to ASCII. Forcing a specific CCSID will work better. Any ASCII CCSID compatible with your target system will work (e.g., 437, 819, or 1252 for US English) but the higher numbers usually have a larger character set and are therefore more useful.

Ideally, you should do away with specific CCSIDs and CS/CP combinations and use Unicode to represent the data. That will allow your stream file (and database files) to hold any character from any character set and any language. The pain of incorrect CCSID tagging will be a long-forgotten thing of the past. CCSID 1208 is UTF-8 and would be a good choice because it is compatible with ASCII data so a CCSID-ignorant ASCII editor will be able to display most of the English text in the file. A CCSID-aware editor will be able to display all the characters.

Regards,
Simon Coulter.
--------------------------------------------------------------------
FlyByNight Software OS/400, i5/OS Technical Specialists

http://www.flybynight.com.au/
Phone: +61 2 6657 8251 Mobile: +61 0411 091 400 /"\
Fax: +61 2 6657 8251 \ /
X
ASCII Ribbon campaign against HTML E-Mail / \
--------------------------------------------------------------------