Re: Problem copying PC document to DDS file with accentuated characters -- MIDRANGE-L

Hello Vern,

Am 31.03.2021 um 16:54 schrieb Vern Hamberg <vhamberg@xxxxxxxxxxxxxxx>:

UTF-8 and UTF-16 MIGHT have what is called a BOM, or byte order mark.

I've never seen BOMs "in the wild", neither on Mac, nor on Linux. Same goes for UTF-16. Maybe this is more prevalent in Asia?

On the i, what I did was try to copy the text file to one with 1208 CCSID - if successful, I considered the contents to be UTF-8. Not great but mostly useful.

I created a file with German Umlauts in UTF-8 encoding on my Mac, FTP'ing it onto a 7.2 machine's IFS with prior setting of CCSID to 1208, and viewing the file with wrklnk, option 5 => Umlauts are there. Proof enough for me that it works as expected.

There might be a caveat when QCCSID isn't set to a meaningful value. For me, it's 273, the same as I chose in Mocha 5250.
Today I learned that everything works fine with the default value (65535) as long as you're not transferring data to other IBM i with a different primary language. If you do, some characters in e. g. SRC PFs look strange. Months ago, I've observed when viewing C includes in QSYSINC in 5250, I've seen German Umlauts and other funny chars where different kinds of braces are supposed to be. Now I know: Because the file has a CCSID of 37!

As far as I've understood, changing QCCSID basically enables dynamic translation from data on disk to the particular display device, and should be set according to the primary language being installed on the system.

I don't know how all the trendy web stuff fits in, though. Maybe others can elaborate on that topic? Especially because there *should* be a meta header in each HTML output, telling the browser the content-type and charset.

Patrik, by the flag in metadata, do you mean the CCSID or the code page?

CCSID. I only know about IBM i and it's predecessors having a concept to externally (without opening and guessing the content) flagging files with a CCSID (or an encoding on other systems).

Other systems don't use those at all, there's basically no metadata that I know
of in text files.

That's also what I observed.

Now PK-ZIP files, they have PK in the first 2 bytes, other file types do similar markings. But not text files.

And many other byte stream files. The so called "magic bytes". More helpful then to guess what's inside by looking at the file name extension.

https://en.wikipedia.org/wiki/File_(command)

(On a side note, the older Macs had a good way to save what's inside a file, and which default application should be launched by double-clicking. https://en.wikipedia.org/wiki/Creator_code — but not the charset being used, if it was type TEXT.)

:wq! PoC

This mailing list archive is Copyright 1997-2025 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.