× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



Mark,

UTF-16 (1200) is a newer standard and a superset of UCS-2 (13488). So yes, there is a downside to using 13488, it has fewer characters and is considered an obsolete standard. There is no downside to using 1200.

How do you determine the CCSID of an Excel sheet? If you mean an actual Excel sheet (and not a "save as text") it's always UTF-8 (CCSID 1208).

It sounded to me like you were having issues with doing a save as unicode text and then cpyfrmimpf. If you are happy with the Excel plugin, I guess there's no reason to troubleshoot this further.

I don't think the difference between 1200 and 13488 would have any impact on Hebrew punctuation. But, it wouldn't hurt to try it and see.

-SK


On 2/29/2016 10:58 PM, mlazarus wrote:
Scott,

At this point I have had reasonable success using the Client Access
plug in from Excel. The CCSID I'm using is 13488. The key to making it
work was defining the data length the same as the display length.

- Is there any downside using 13488 vs. 1200?
- How do I determine what CCSID an Excel sheet is?
- This is a multi-language file. One of the languages is Hebrew, which
reads from right to left. I'm having an issue where trailing
punctuation (e.g. period, slash, colon, etc.) gets shifted to the
beginning of the text string. So far everything else looks good. Would
1200 resolve this issue?

-mark

On 2/29/2016 4:30 PM, Scott Klement wrote:
John,

My RPG wrappers for POI do not have a Unicode option at this point.
(There hasn't been a demand for it.) They accept the data in EBCDIC
format only.

This could be changed in the code, of course. Wouldn't be that hard
-- but I haven't worked with POI in a few years now, I really don't
use Excel at my current job.

With regard to Mark's original problem, it kinda sounds like somewhere
along the line the data is being converted to EBCDIC. This is the
tricky part, really, Unicode works brilliantly but you need to be
careful that something along the line doesn't try to translate it to
EBCDIC or ASCII because those environments are very limited by
comparison to Unicode.

Your files, it sounds like, are using UCS-2 (CCSID 13488), which is an
old version of Unicode, but still certainly much better than EBCDIC.
I'm not sure what "Unicode text" does in Excel, though. Does that
produce UTF-8?

Then you say you drag/drop the file. When you do that, are you going
in and setting the CCSID? IBM i doesn't work like Windows. It
figures out the character encoding based on the CCSID, whereas Windows
looks for stuff like the byte-order mark (which is as flexible,
imho). But since Windows doesn't have a CCSID, when you drag/drop the
files, you'll probably get a default value. You'll want to make sure
you set it to the "right" value (1208 for UTF-8, 1200 for UTF-16,
13488 for UCS-2, though UCS-2 is a subset of UTF-16, so no real reason
to use that.) It's important to set this to whatever flavor of Unicode
Excel has used BEFORE running CPYTOIMPF.

Does that help?




On 2/25/2016 10:28 PM, John Yeung wrote:
On Thu, Feb 25, 2016 at 10:59 AM, mlazarus <mlazarus@xxxxxxxx> wrote:
What is the "safest" way to do file transfers with Unicode data?
Meaning,
to keep the integrity of the data, not from a security standpoint.

It kind of depends. (What else is new?) But one of the cases you
brought up is getting data out of an Excel file and into a PF or
database table. A highly safe way of doing that is to put the Excel
file onto the IFS (binary transfers normally go pretty well) and then
read the Excel file on the i.

I use iSeriesPython and xlrd for this; most folks on this list use
Java and POI (usually through Scott Klement's RPG wrappers).

For other file formats, I find the main challenges are (a) knowing
what encoding the data is already in, and what encoding you need the
data to be in when it gets where it's going; and (b) understanding the
*concept* of Unicode. If you can't tackle these, the whole endeavor
reduces to flailing around, trying different encodings at various
points in the process.

John Y.



As an Amazon Associate we earn from qualifying purchases.

This thread ...

Follow-Ups:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.