On 04 Mar 2013 09:07, Vernon Hamberg wrote:Agreed - but this data is restricted to north american use for now, so this seems safe enough. And we are trying to get away from some user deciding how to save an email attachment, where they might be told to save as ANSI. But users make mistakes!
I believe I've an answer that ends up being very flexible.That should convert to EBCDIC using the Job [default] CCSID. The
CPYFRMSTMF ... DBFCCSID(*FILE) ... CVTDTA(*AUTO)<<SNIP>>
This was preceded by a CRTPF QTEMP/FLATFILE RCDLEN(3000)
program-described file effectively has no CCSID, but *AUTO still must
effect conversion from ASCII encoding to EBCDIC encoding. That would
sure limit the data that can be processed; i.e. sure defeats the purpose
of having used UTF-16 :-)
When I specify ENDLINFMT(*ALL), CPYFRMSTMF converts any eligible such marker as the end of a record. That marker is removed.
With UTF-16, there is an extra row interleaved - because there is aI do not see that issue on v5r3; my files have just *LF. Seems like
Unicode CRLF, and the conversion sees the CR and the LF as separate.
No problem - this is easy to clean up!
a defect. Or perhaps I do not understand what is being described.?
CPYFRMSTMF appears to do a byte-by-byte conversion, as it seemed CPYFRMIMPF did as well, just not all the time. So the typical UTF-16 representation for our situation is ANSI characters alternating with x00's. We also have tab characters, which end up as x05 in the PF.
<<SNIP>> And the nulls (UTF-16 only) can be cleaned up with an SQLWhy would a null character appear in /text/ data? The only expected
REPLACE function. <<SNIP>>
control characters in a text file are EOR; e.g. *CR, *CRLF, *LF, *LFCR.
Probably not an issue here - I've thought of this, but this is all textual data and would have no control characters in it, other than the tab.
Any cautions are much appreciatedThe ability to have embedded CRLF in delimited column data would be
lost, because the stream is split into database records for each
apparent EOR, even if the control characters were not meant to be seen
as EOR. Obviously having to choose a fixed record length can be an
issue, since there is no such limit for the stream data.
It doesn't do a 1200 - EBCDIC transform - remember that these stream files are flagged with CCSID 1252, as coming from Windows. They would likely be flagged as 819 if FTP is the transfer mechanism, but these are coming in emails.
still, this does look pretty cool - no transform needed, similarSo does that mean that the noted CPYFRMSTMF is functional for both
effect to how we were using CPYFRMIMPF for ANSI-encoded stream files.
UTF-16BE and UTF-16LE, such that effectively it does the transform of
the data to enable the CCSID xlation from 1200 to the defaulted EBCDIC
If it can handle the transform, then it would seem odd that the
Byte-Order-Mark (BOM) would not be dropped per its no longer having
meaning in a database file.mbr.