On Fri, Aug 11, 2017 at 5:47 PM, James H. H. Lampert
<jamesl@xxxxxxxxxxxxxxxxx> wrote:
DAMN, these things are counterintuitive.
Even though I was specifying MY CCSID, rather than
the FILE's CCSID.
See, this is one of the things that, in my opinion, make text
processing in various encodings tricky.
The fact that you even have a notion of "my CCSID" versus "the file's
CCSID" is one too many CCSIDs. At least to deal with at one time.
Python 3 and Joel Spolsky's article on Unicode cleared things up for
me very nicely. In Python 3, the internal representation of text is an
implementation detail, and one that you normally don't have to be
aware of. You are encouraged to think of Python text as Spolsky's
"platonic ideal characters". No specific encoding, just conceptually
*human-understandable text*. Python 3 has mechanisms to help nudge you
along this path. (Indeed, this was by far the most fundamental change
between Python 2 and Python 3.)
When you are reading text into your "host language" from an external
source, you do have know what the actual (not merely stated) CCSID is
of the external source. You need to know that you have to *decode*
that into "platonic ideal characters" (or whatever your host language
uses for its internal representation of text - which preferably you
can put out of your mind entirely!). Then, when it comes time to
output data to an external target, then you have to *encode* your
platonic ideal characters into some concrete CCSID.
RPG doesn't have such clean text machinery, but it might help if you
can get into the mindset I've described.
John Y.
As an Amazon Associate we earn from qualifying purchases.