× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



Gary

I went through this exercise not that long ago - there should be a thread on it here.

I did come up with a "good-enough-for-here" solution to determining which encoding is used. This is with full knowledge that this is impossible to guarantee correct results.

Basic idea - see if there is a BOM - byte order marker. There is one for UTF-8 (not really needed, but there it is) and 2 for UTF-16 (BE and LE). If they are there, ba-da-bing.

If no BOM, see if there are NULLs in the text - those empty spots between characters are usually x'00' - then it is UTF-16 - I determined that it doesn't matter whether it is BE or LE - I just stripped out all the NULLs and it's good. I know, this is not really correct for all purposes, but it works with what we have here. Will need to review when languages other than Latin-1 type European languages need to be handled. None of that at this time.

Finally, if no BOM and no NULLs, see if it can be UTF-8. On the net I saw the idea of doing some kind of conversion that'd error out if there are invalid UTF-8 sequences. In this case, I do a binary copy of the original to an IFS file tagged as CCSID 1208 - in our case we always got a CCSID of 1252, coming from Windows, so had to jump through these hoops.

Then a CPYFRMSTMF of the 1208 copy - if it failed, I decide it is ANSI, if OK, it is UTF-8.

If someone is tempted to tell me I'm nuts, please go to the other thread - see if you have something new to say! :) I got lots of great information from y'all (wishing I were in Austin, trying the accent) and lots of caution recommended. All appreciated. So, at least as far as I can tell, these choices are informed. And I'm glad to hear of better ways.

BTW, Notepad seems always to put on the BOM for UTF-8 and UTF-16 of either endian. Textpad has a configuration setting that controls this.

Using this, we don't have to have anyone re-save anything to a certain encoding, we can automate the process more easily, which includes reading a POP3 mailbox and getting the attachments.

HTH
Vern

----- Original Message -----
Chuck,
Thank you for the FROMSTMF clarification, we will try the conversion
as soon as my co-worker is available.

Here is a link to the page I found on CCSID 1201:
http://www-01.ibm.com/software/globalization/ccsid/ccsid1201.html

To John Yeung's point about UTF-16 BE - I am ignorant about this topic,
but made my assumption/statement because the files we want to convert
are from "quick and dirty" queries run in MS SQL Server Studio 2005.

We run various queries and sometimes find it handy to right-click
in the results pane and select "Save Results As..."

These files always appear to be in the same "double byte character format";
what that format is, I don't know.

Viewing the saved files with Notepad we see what appears to be a good
candidate for CPYFRMIMP FROMSTMF - but, so far, we only have success
using MS Excel "Text to Columns" function to create a ".csv".

As Joe Wood commented, it will be a real time-saver when we get the
"character conversion" problem solved.

Thanks everyone - we will post what we learn

-----Original Message-----
From: midrange-l-bounces@xxxxxxxxxxxx [mailto:midrange-l-bounces@xxxxxxxxxxxx] On Behalf Of CRPence
Sent: Tuesday, April 09, 2013 7:56 PM
To: midrange-l@xxxxxxxxxxxx
Subject: Re: CPYFRMIMPF FROMCCSID(1201)

On 09 Apr 2013 16:37, Gary Thompson wrote:
My first attempt at "consuming" a .txt file in, I think, Unicode
format did not work as expected.

CPYFRMIMPF did copy the data to the local EBCDIC file, but the data
was still "double byte" (an "extra" space between each character).

I tried both
FROMCCSID(1201)
FROMCCSID(1200)

CCSID is described as UTF-16 BE and it seems to be a Windows
"standard" for text files.

AFaIK the FROMCCSID parameter applies to the FROMFILE, not the FROMSTMF [¿unless the FROMSTMF CCSID is 65535?]. See the doc link after the two archived message links. If the results are not as described, e.g. no error when the /from-file/ [as FROMSTMF] is not *HEX, then either report the contradiction with the effects and the doc as a defect.

Note: I do not know anything about CCSID(1201), I know only of
CCSID(1200) as UNICODE UTF-16BE. There is no mention of 1201 here:
http://pic.dhe.ibm.com/infocenter/iseries/v7r1m0/topic/rzaha/fileenc.htm

Correctly tag the CCSID for the FROMSTMF [e.g. CHGATR], and the problem will be resolved; well, it has always worked for me anyhow, since v5r3. For example, as described in the recent messages:

http://archive.midrange.com/midrange-l/201303/msg00039.html
"... The CCSID for the FROMSTMF is always the CCSID of the STMF. Thus if the stream files are improperly tagged, then improper effects are to be expected. To correct the CCSID tagging, use the following request:
CHGATR OBJ(the_file) ATR(*CCSID) VALUE(1200) ..."

http://archive.midrange.com/midrange-l/201303/msg00084.html
"... Again, that issue [almost positively] is due to the incorrect tagging; i.e. wrong CCSID. When the STMF is tagged with something other than 1200 but its data is UTF-16BE, then the copy feature does not work.
So if the file is incorrectly tagged with CCSID-1252 or CCSID-819, then CPYFRMSTMF does not know you lied, and tries to convert the data based on that lie, then the effect will *appear to be* byte-by-byte conversion. That is because the feature has no idea that the data is two-byte characters, when the CCSID says they are not. Unlike the CPYFRMIMPF however, the STMFCCSID [or STMFCODPAG on older releases] can be used to override the STMF CCSID to 1200" <ed: addendum:> on the CPYFRMSTMF command.

http://pic.dhe.ibm.com/infocenter/iseries/v7r1m0/topic/cl/cpyfrmimpf.htm
_i Copy From Import File (CPYFRMIMPF) i_ "The Copy From Import File (CPYFRMIMPF) command copies all or part of an import file to the TOFILE. The term import file is used to describe a file created for purposes of copying data between heterogeneous databases. The import file (FROMSTMF or FROMFILE parameter) is called the from-file for this command.
...

From CCSID (FROMCCSID)

Specifies the coded character set identifier (CCSID) of the from-file.

*FILE
The from-file CCSID is used. If the from-file is a tape file, the job's default CCSID is used.

1-65533
Specify the CCSID to be used when the CCSID of the from-file is 65535, or if the from-file is a tape file. If the from-file CCSID is not 65535, or the from-file is not a tape file, an error message will be sent.

..."

--
Regards, Chuck
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list To post a message email: MIDRANGE-L@xxxxxxxxxxxx To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxx Before posting, please take a moment to review the archives at http://archive.midrange.com/midrange-l.




As an Amazon Associate we earn from qualifying purchases.

This thread ...

Follow-Ups:
Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.