On Wed, May 13, 2020 at 10:36 AM <smith5646midrange@xxxxxxxxx> wrote:
Also, some of this terminology might be wrong such as the "ASCII DBCS".
Please feel free to correct any terminology so I know for later when I am
trying to explain this.
I agree with your instinct that "ASCII DBCS" may be wrong terminology.
However, it's just an instinct for me too, because I am not an expert
on IBM i terminology.
What I can tell you is that ASCII is, by definition, a single-byte
encoding scheme, so it doesn't really "go with" DBCS. I've noticed
that many folks have a mental model involving a dichotomy between
EBCDIC and ASCII, and all that "non-EBCDIC mainstream stuff" out there
in the non-IBM world is therefore "ASCII".
That's a gross oversimplification, but an understandable one, because
it would be fair to say that most of the non-EBCDIC encoding schemes
out there were outgrowths or successors to ASCII, and were purposely
designed to be supersets of the original ASCII.
By now, most of the world has converged on various Unicode encodings,
so it's likely the CSV in question is using one of those, especially
UTF-8.
I highly recommend reading the following to get a basic understanding
of Unicode and the issues surrounding character encoding:
https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/
If that link is too long or cut off awkwardly, try
https://bit.ly/35Uq7zY
The explanation and terminology in that article is almost completely
oblivious to IBM i, but it still serves as a great foundation for
refining one's mental model of encoding.
John Y.
As an Amazon Associate we earn from qualifying purchases.