× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



Scott,

At this point I have had reasonable success using the Client Access plug in from Excel. The CCSID I'm using is 13488. The key to making it work was defining the data length the same as the display length.

- Is there any downside using 13488 vs. 1200?
- How do I determine what CCSID an Excel sheet is?
- This is a multi-language file. One of the languages is Hebrew, which reads from right to left. I'm having an issue where trailing punctuation (e.g. period, slash, colon, etc.) gets shifted to the beginning of the text string. So far everything else looks good. Would 1200 resolve this issue?

-mark

On 2/29/2016 4:30 PM, Scott Klement wrote:
John,

My RPG wrappers for POI do not have a Unicode option at this point. (There hasn't been a demand for it.) They accept the data in EBCDIC format only.

This could be changed in the code, of course. Wouldn't be that hard -- but I haven't worked with POI in a few years now, I really don't use Excel at my current job.

With regard to Mark's original problem, it kinda sounds like somewhere along the line the data is being converted to EBCDIC. This is the tricky part, really, Unicode works brilliantly but you need to be careful that something along the line doesn't try to translate it to EBCDIC or ASCII because those environments are very limited by comparison to Unicode.

Your files, it sounds like, are using UCS-2 (CCSID 13488), which is an old version of Unicode, but still certainly much better than EBCDIC. I'm not sure what "Unicode text" does in Excel, though. Does that produce UTF-8?

Then you say you drag/drop the file. When you do that, are you going in and setting the CCSID? IBM i doesn't work like Windows. It figures out the character encoding based on the CCSID, whereas Windows looks for stuff like the byte-order mark (which is as flexible, imho). But since Windows doesn't have a CCSID, when you drag/drop the files, you'll probably get a default value. You'll want to make sure you set it to the "right" value (1208 for UTF-8, 1200 for UTF-16, 13488 for UCS-2, though UCS-2 is a subset of UTF-16, so no real reason to use that.) It's important to set this to whatever flavor of Unicode Excel has used BEFORE running CPYTOIMPF.

Does that help?




On 2/25/2016 10:28 PM, John Yeung wrote:
On Thu, Feb 25, 2016 at 10:59 AM, mlazarus <mlazarus@xxxxxxxx> wrote:
What is the "safest" way to do file transfers with Unicode data? Meaning,
to keep the integrity of the data, not from a security standpoint.

It kind of depends. (What else is new?) But one of the cases you
brought up is getting data out of an Excel file and into a PF or
database table. A highly safe way of doing that is to put the Excel
file onto the IFS (binary transfers normally go pretty well) and then
read the Excel file on the i.

I use iSeriesPython and xlrd for this; most folks on this list use
Java and POI (usually through Scott Klement's RPG wrappers).

For other file formats, I find the main challenges are (a) knowing
what encoding the data is already in, and what encoding you need the
data to be in when it gets where it's going; and (b) understanding the
*concept* of Unicode. If you can't tackle these, the whole endeavor
reduces to flailing around, trying different encodings at various
points in the process.

John Y.


As an Amazon Associate we earn from qualifying purchases.

This thread ...

Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.