On 21/05/2010, at 12:13 AM, Pat Barber wrote:
Using Sheri's suggestion, I used 37 as the ccsid and did the copy
to the IFS and that created a file with a ccsid 437. I then opened
that file directly with Excel and it opened just fine. I also tried it
with Notepad and it opens it with individual records.
37 is the host EBCDIC CCISD for US English (and a few other variants)
437 is the ASCII equivalent when using the IBM PC Data encoding scheme.
Thus 37 converts happily into 437 and vice versa.
This entire ccsid thing is a mystery to me. I thought that was related
to different language versions of the OS.
Not directly. The CCSID is a number that indicates a character set
(CS) and code page (CP). Character set is the range of characters
(e.g., a-z, etc.) supported. Code page is the range of code points
used to represent those characters. Thus one can have the same
character set implemented with different code points giving the same
CS with different CP values, in turn giving a different CCSID. One can
also have the same code point represent different characters in
different CCSIDs. This is why a $ sign appears as a £ (Pound Sterling
for those of you with retarded e-mail clients) when viewed on a UK
system without proper CCSID settings.
US English: CCSID=37, CS=697, CP=37
UK English: CCSID=285, CS=697, CP=285
You can see they support the same character set but use different code
points to represent some of those characters. The $ sign in US English
uses the same code point as the £ sign in UK English thus without
correct CCSID conversion (e.g., because you are still running CCSID
65535 [although it's a bit more complicated than that due to job
default CCSID based on language ID]) the $ sign is interpreted and
displayed as a £ sign.
The invariant character set uses the same code points for the same
characters in most CCSIDs.
Some CCSID values are the same as their corresponding CP, some are
different, and some CCSIDs have multiple CS and CP values. The
similarity of CCSID and CP values gives rise to the erroneous idea
that they mean the same thing. CCSIDs with multiple CS and CP values
are generally multi-byte encodings with support for Asian languages
(e.g, Japanese, Simplified Chinese, etc.) One set of CS/CP pairs is
for SBCS data and another pair is for DBCS data.
Because different languages require different characters sets to
represent the full range of characters used by a given language (e.g,
accented characters for so-called Romance languages or Cyrillic for
Russian, etc.) there is a relationship between CCSID and language but
individually they are quite separate things.
I just don't understand the cpytoimpf and it's results.
I just ran a test using this command:
CPYTOIMPF FROMFILE(TCS/SNPREM) TOSTMF('/home/patrick/snpremtst.csv')
RCDDLM(*CRLF)
That creates a file with ccsid of 37.
That's expected behaviour when using the default value for the
STMFCODPAG keyword. See the help text for STMFCODPAG:
"If the stream file does not exist, the code page equivalent of the
source database file CCSID is used and associated with the stream file."
Thus the source file is in CCSID 37, the associated CP is 37, the
stream file does not exist and so is created with a CP of 37 (or the
equivalent CCSID on releases that support CCSID for stream files).
This explains why one common solution to the problem is to pre-create
the stream file with an appropriate CCSID. In that case the copy will
convert from the source file CCSID to the target stream file CCSID.
That's readable using wrklnk but is complete garbage opened in
WordPad.
Yep, because it's EBCDIC data so it's legible in option 2 of WRKLNK
but will be junk in WordPad or Notepad because they are expecting
ASCII data. Note that if you transferred the file to the PC via FTP in
text mode it would be converted to ASCII and be legible. Also if you
had NetServer set up correctly it would convert to ASCII and again the
data would be legible via a share.
but if I do this:
CPYTOIMPF FROMFILE(TCS/SNPREM) TOSTMF('/home/patrick/snprem437.csv')
MBROPT(*REPLACE) FROMCCSID(37) STMFCODPAG(*PCASCII) RCDDLM(*CRLF)
It is readable in WordPad and Excel and the ccsid in the IFS is
1252 ???
That's because 1252 is the associated ASCII CCSID for EBCDIC 37 in the
ISO-8 encoding scheme (the so-called MS-Windows encoding). *PCASCII
uses the x'4105' encoding scheme (ISO-8). *STDASCII uses the x'2100'
encoding scheme (IBM PC Data). One might more reasonably expect
*STDASCII to use the x'4100' encoding scheme which would give a CCSID
of 819.
(I've always thought these special values were screwed up and should be:
*PCASCII = x'2100' IBM PC Data e.g., 437
*STDASCII = x'4100' e.g., 819
*WINASCII = x'4105' MS-Windows e.g., 1252
Perhaps *WINASCII should more properly be *ISO8. There should be a
*MACASCII too which is also encoding x'4105' but uses different CS and
CP values from the MS-Windows variants.)
Is there a correct combination to use if the file is to be send to
another site ???
Obviously you should use the CCSID appropriate for the other site :)
In practice anything will work as long as the environment (e.g., FTP,
NetServer, etc.) is set up to correctly convert to ASCII. Forcing a
specific CCSID will work better. Any ASCII CCSID compatible with your
target system will work (e.g., 437, 819, or 1252 for US English) but
the higher numbers usually have a larger character set and are
therefore more useful.
Ideally, you should do away with specific CCSIDs and CS/CP
combinations and use Unicode to represent the data. That will allow
your stream file (and database files) to hold any character from any
character set and any language. The pain of incorrect CCSID tagging
will be a long-forgotten thing of the past. CCSID 1208 is UTF-8 and
would be a good choice because it is compatible with ASCII data so a
CCSID-ignorant ASCII editor will be able to display most of the
English text in the file. A CCSID-aware editor will be able to display
all the characters.
Regards,
Simon Coulter.
--------------------------------------------------------------------
FlyByNight Software OS/400, i5/OS Technical Specialists
http://www.flybynight.com.au/
Phone: +61 2 6657 8251 Mobile: +61 0411 091 400 /"\
Fax: +61 2 6657 8251 \ /
X
ASCII Ribbon campaign against HTML E-Mail / \
--------------------------------------------------------------------
As an Amazon Associate we earn from qualifying purchases.