× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



On Tue, Mar 1, 2016 at 2:46 AM, Scott Klement
<midrange-l@xxxxxxxxxxxxxxxx> wrote:
How do you determine the CCSID of an Excel sheet? If you mean an actual
Excel sheet (and not a "save as text") it's always UTF-8 (CCSID 1208).

I don't think it's necessarily always UTF-8. I believe the Office Open
XML specification allows either UTF-8 or UTF-16. It might be that, in
practice, everyone always chooses UTF-8, especially nowadays. But
there is a lot of third-party software out there for working with
Excel files, of all different ages and all different quality levels,
and on all different platforms. So some could have made a different
choice (including by accident!).

I don't think it was mentioned that the older .xls (BIFF) format was a
possibility, but if you allow for that, then the door is definitely
wide open for UTF-16 and perhaps even other encodings.

Normally, folks don't have to worry about the specific encoding inside
an Excel file, though, because whatever package or tool they are using
to get the data out *should* already be handling it. (Likewise,
whatever they're using to put data into an Excel file should be
writing a valid encoding.)

The approach taken by all the Python packages (and I would guess POI
as well, but I don't know) is to use pure, *unencoded* Unicode text
while the data is in memory. That is, from the programmer's
perspective, text that is under the control of Python itself is
encoding-agnostic. It's only when you have to read or write a file (or
other external data source) that you have to choose an actual
encoding.

And this is why I recommended this as a "safe" way to transfer data.
You let the Excel-reading package figure out how to extract the data,
and once it's extracted, you have it in a pure form; then when you
write it out, you choose which encoding to use (via Python or Java or
whatever).

I completely agree with the points that (a) if the Excel plug-in is
working well enough, then stick with it; and (b) it wouldn't hurt to
give CCSID 1200 a try. In fact, it would contribute value to the
archive if OP does give it a try and reports back what the result was.

John Y.

As an Amazon Associate we earn from qualifying purchases.

This thread ...

Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.