× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



Y'all

Although Joel Spolsky has a lot of good information in that article that John gave the link for, which I've seen in the past, he is clearly anti-EBCDIC, as seen in this sentence -

You probably think I’m going to talk about very old character sets like EBCDIC here. Well, I won’t. EBCDIC is not relevant to your life. We don’t have to go that far back in time.

I looked up CCSID 859 and was taken to a page on code page 850 and find it is an extension of CP 437, which is the ASCII set for original PC-DOS, or US ASCII - 850 replaces several box-drawing characters with Latin letters that have diacritics - those accent marks. Apparently 850 has been replaced in many instances by Windows 1252, which one source says has the same letters as 850. 850 was created to handle European languages, basically.

Anyone remember making graphics in DOS? Too much fun back then!

So text files in Windows are not necessarily tagged as being any of these, right?

Now I see files that I've uploaded using FTP marked differently, often 1252, I think.

I just uploaded a simple text file using ACS, and the result was CCSID 437 - it had no diacritics in the text. You in Europe might be getting the 850 due to your locale.

But when I saved somethings as UTF-8 using TextPad, it still got uploaded as 437 - same with UTF-16 - and I do not see any BOMs - byte-order-markers, which I thought was required in UTF-16, optional in UTF-8. Guess I was mistaken.

This is a lot of words to say, we probably aren't going to get a simple solution where we don't tell the transfer process what we need. Also, is there a way to set Java properties in the IFS action? Does it matter?

Too many words for sure! Someone help me, please!

Regards
Vern

On 3/30/2021 3:38 AM, Patrik Schindler wrote:
Hello John,

Am 29.03.2021 um 15:44 schrieb John Yeung <gallium.arsenide@xxxxxxxxx>:

I don't know when "MBCS" was introduced, so maybe you are speaking from a context where SBCS and DBCS are the only possible choices. But UTF-8 is definitely a MBCS encoding. The part of UTF-8 that overlaps with ASCII are one-byte characters, but any given character[1] can use up to 4 bytes to encode in UTF-8.
You're perfectly right, in a generic way. But I'm cautious to assume anything in case of encodings and IBM i. First, because the original Request was regarding french accent characters, and I don't know if these are in the SB or MB range.

CCSID is a special IBM construct that is a little bit more complicated than the notion of "encoding" that the rest of the world uses.
I still have the "Speak the right language…" PDF from IBM on my reading list. :-)

But I would say a good first step (and for many, the only step that's really needed) is to pretend that CCSIDs really are "just" encodings in the same sense that the rest of the world uses, and then learn about those encodings. And yet again, I will point to my favorite article on the topic:

https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/
Thanks! I've put it on my reading list.

I know about iconv and like BBEdit on my Mac to most often guess the encoding right. Vim does alike. In theory, this ambiguity should be resolved by IBM i saving the "encoding" as Metadata per file. But of course, it must be correct also. And I assume that the culprit from the OP is there.

:wq! PoC



As an Amazon Associate we earn from qualifying purchases.

This thread ...

Follow-Ups:
Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.