On Fri, Aug 11, 2017 at 5:47 PM, James H. H. Lampert
<jamesl@xxxxxxxxxxxxxxxxx> wrote:
DAMN, these things are counterintuitive.

Even though I was specifying MY CCSID, rather than

See, this is one of the things that, in my opinion, make text
processing in various encodings tricky.

The fact that you even have a notion of "my CCSID" versus "the file's
CCSID" is one too many CCSIDs. At least to deal with at one time.

Python 3 and Joel Spolsky's article on Unicode cleared things up for
me very nicely. In Python 3, the internal representation of text is an
implementation detail, and one that you normally don't have to be
aware of. You are encouraged to think of Python text as Spolsky's
"platonic ideal characters". No specific encoding, just conceptually
*human-understandable text*. Python 3 has mechanisms to help nudge you
along this path. (Indeed, this was by far the most fundamental change
between Python 2 and Python 3.)

When you are reading text into your "host language" from an external
source, you do have know what the actual (not merely stated) CCSID is
of the external source. You need to know that you have to *decode*
that into "platonic ideal characters" (or whatever your host language
uses for its internal representation of text - which preferably you
can put out of your mind entirely!). Then, when it comes time to
output data to an external target, then you have to *encode* your
platonic ideal characters into some concrete CCSID.

RPG doesn't have such clean text machinery, but it might help if you
can get into the mindset I've described.

John Y.

This thread ...


Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2020 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].