×
The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.
On 27-Sep-2016 17:51 -0500, John Yeung wrote:
On Tue, Sep 27, 2016 at 3:47 PM, CRPence wrote:
If the code point 0x0A is required to be stored in the EBCDIC
application data [to avoid changing the application], then
consider:
That 0x0A is the Repeat (RPT) control character in EBCDIC, into
which the Single Shift Two (SS2) control character should translate
from the code point 0x8E in ASCII.
I would emphasize *should*.
And I *should* have done that :-)
In reality, "ASCII" is not some universal, monolithic standard. There
are many, many variants, especially when it comes to 8-bit values.
Not all flavors of ASCII have an 0x8E defined, and some common
variants have it defined as something else (notably CP437 and
CP1252). Even UTF-8, the dominant encoding on the Web and one that
most people think "will handle everything", will have problems if you
try the 0x8E "trick". (You would need to use 0xC28E instead.)
It is likely that the ASCII flavor being used here is CP819, and in
that case, I believe the 0x8E character will be translated as
desired to 0x0A. But it's worth being aware of the other ASCII
variants, especially since we're talking about PHP.
The link to the URL showing the control characters for conversion
between the ISO-8 and EBCDIC was supposed to imply an effective
operating assumption, for use of CP00819 [and CCSID 819] as ISO 8859-1
ASCII [i.e. the same as the "Global Use: Syntactic Character Set in
Single-Byte ISO-8 Encoding scheme x4100"] or the nearly same ISO 8859-15
ASCII multilingual of CP00923 [and CCSID 923] for which Euro support is
included.
By further review of that page, could easily be inferred that an
IBM-PC code page [encoding scheme x2100], per lack of any xlate going to
the EBCDIC code point 0x0A, that those would not have any character to
effect the same /circumvention/; i.e. eliminates use of CP0437. Does
much of anything really still use that encoding?
CDRA Appendix G. Control character mappings
[
http://www.ibm.com/software/globalization/cdra/appendix_g.html#ISO-8%20to%20EBCDIC]
FWiW, part of the reason I loathed to suggest 0x8E was not just that
it might function [presently, but come back to figuratively bite them],
but also that it might not function. Yet I was not interested in trying
to suss out, first, what encoding scheme was being used currently,
because knowing might not even help to predict the effect. I figured
better just to suggest an option, rather than offering several options
along with an attempt to explain each; dumb it down and leave the
learning about why it did or did not help to the reader :-) As for UTF8
[CCSID 1208], I figured that if they were using UTF8, then probably they
would already have experienced difficulties per the inability to
translate so many [and growing] possible number of characters into such
a small code page; and often for such scenarios, they would have already
had to have converted their field to store UTF8 or effective UTF16 data
to prevent data loss. But as to the Microsoft Windows encoding scheme
(x4105) [e.g. CCSID 1252], I did not even think to look, despite knowing
that with that encoding, MS had re-purposed most of the code points in
that range 0x80 to 0x9F with actual glyphs; I did think it funny, that
from what shows in the block of that&surrounding characters, there was
apparently about a 50% chance of that 0x8E control character having
survived ;-)
As an Amazon Associate we earn from qualifying purchases.