× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



John,

The Unicode "standard" says that the BOM should only be included in UTF-8 when necessary. (I'm paraphrasing... but it's something to that effect.)

You are saying that it's not necessary here because Notepad++ can figure it out? The trouble with that is that Notepad++ has to read THE WHOLE FILE and analyze every single character. Even then it can't be 100% certain, it assumes that if something would fit a valid UTF-8 character it therefore must be UTF-8 rather than ASCII used with control characters...

I agree that Notepad++ does a nice job of this... but don't you think it'd be a LOT nicer if applications could determine this by just reading the first 3 bytes? Not to mention much more performant? And that's what the BOM does in this case.

On IBM i, perhaps, it is unnecessary since the CCSID in the file description is even more efficient. But... on Windows where there is not CCSID, I'd say it's "necessary". It makes applications (by major vendors like Microsoft whose software I cannot change) work properly, and makes things much more clear and efficient even for those that can detect UTF-8 properly without it.

Just my take on it, of course.



On 3/14/17 6:01 PM, John Yeung wrote:

Here there is definitely disagreement. Not from me personally, but the
consensus interpretation of the Unicode standard is that the BOM is
discouraged for UTF-8. Microsoft in particular goes against this
sentiment and requires the BOM, using it roughly analogously to the
way IBM i uses CCSID.

I will point out (without making any recommendation one way or the
other) that it's not impossible to set the CCSID correctly, even
without the BOM. You gave the example of Notepad++ which already does
this. (Well, it's Windows software, so doesn't directly use CCSID, but
the point is it doesn't need the BOM to detect UTF-8.)

John Y.



As an Amazon Associate we earn from qualifying purchases.

This thread ...

Follow-Ups:
Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.