Re: Ascii and EBCDIC translation -- MIDRANGE-L

Scott

i really don't expect to use a lot of effort to configure a PC client to
transfer data
in the right format to the IFS - besides that UTF-8 needs special
recognition - else
it just an ASCII file

On Fri, May 8, 2015 at 9:15 PM, Scott Klement <midrange-l@xxxxxxxxxxxxxxxx>
wrote:

> Henrik,
>
> I believe those CCSIDs are configurable.  Also, in some circumstances IBM
> i will try to pick the CCSID based on other factors (such as using an ASCII
> that supports the same set of characters as the EBCDIC you're using.)
>
> So I wouldn't say they'd always be 819 or 1252, but I think in most
> western countries (those using Latin-1 character set) it'll probably be 819
> and 1252 unless someone has reconfigured them.
>
> Chuck Pence will probably reply to this and tell you precisely how it
> works :-)
>
> -SK
>
>
>
> On 5/8/2015 2:06 PM, Henrik Rützou wrote:
>
>> Scott
>>
>> in general we can say that files from FTP in the IFS will become CCSID 819
>> while files dragged and
>> droped from your windows will become CCSID 1252 in the IFS - or am I
>> wrong?
>>
>>
>> On Fri, May 8, 2015 at 8:59 PM, Scott Klement <
>> midrange-l@xxxxxxxxxxxxxxxx>
>> wrote:
>>
>>  Jim,
>>>
>>> In file transfer situations, I would never trust the CCSID file attribute
>>> (unless you've already made sure that it's right, of course).
>>>
>>> Unless you're transferring a save file from another IBM i
>>> system/partition, the CCSID is not part of what gets transferred.  All
>>> that's transferred is the data itself.  The system will usually just
>>> assign
>>> a 'default' CCSID -- it has no way of knowing if it's the right one for
>>> your data.  It expects you to change it accordingly if your data is
>>> different.
>>>
>>> If you are finding that a single character (such as a "smart quote" or
>>> international symbol) is showing up as two bytes of data, resulting in
>>> extra 'garbage' when translated to EBCDIC, this almost always means that
>>> the data is UTF-8, but you're telling the system that it's ASCII (such as
>>> 819) and therefore it will translate the basic alphabet and numbers
>>> correctly, but more 'special' characters will be mistranslated.
>>>
>>> Really, considering that it's 2015, we should all be using Unicode (UTF-8
>>> or UTF-16) for as much as possible.  ASCII and EBCDIC are really
>>> cumbersome.  But, I know it's hard when you have so many applications
>>> that
>>> are already in EBCDIC -- but an all-unicode environment is really what
>>> you
>>> should be striving for in the long run, if you can't do it today.
>>>
>>> Anyway -- how to "purify" the data -- there are certain commonplace
>>> issues, such as replacing "smart quotes" with straight quotes that make
>>> sense to do. I would definitely do this in Unicode (or ASCII if that's
>>> what
>>> it is) before translating to EBCDIC.
>>>
>>> But aside from these common things, it's general ugly and nasty to remove
>>> "unwanted" characters.  There's no good way to do this, since there's
>>> really no way the computer knows which characters are "allowed" and which
>>> are not.  How does it know whether a half-moon character, for example, is
>>> intentional or whether it's an error?  Same is true of accented
>>> characters
>>> -- often times people (at least in the USA) will see these and say they
>>> are
>>> "garbage" -- but, they are normal parts of human languages in most of the
>>> world.  How can the computer know that they are "garbage"?  Obviously,
>>> it's
>>> easy for us as human beings to look at the data and realize that a
>>> particular character doesn't belong there -- but I'm sure you understand
>>> that a computer can't see things that way.
>>>
>>> So I guess if you want to "purify" your data, the BEST way to do that is
>>> to find out where these unwanted characters are coming from, and have it
>>> stop sending them.  If you really, truly, can't do that then the "hack"
>>> would be to make a list of everything you DO want, and remove everything
>>> else.  What is/isn't a wanted character will almost certainly vary from
>>> application to application, so there isn't really any built-in way to do
>>> this.  Just make a string of all the characters you want, and use RPG
>>> operations like %CHECK to find the ones not in that character set and
>>> remove them.  But, this really is a hack...
>>>
>>>
>>>
>>> On 5/8/2015 1:33 PM, Jim Franz wrote:
>>>
>>>  without asking every entity, can one tell looking at the file
>>>> attributes?
>>>>
>>>> Jim
>>>>
>>>> On Fri, May 8, 2015 at 2:28 PM, Henrik Rützou <hr@xxxxxxxxxxxx> wrote:
>>>>
>>>>   Jim
>>>>
>>>>>
>>>>> even if the files you receive is in CSSID 819/1252 are you sure that
>>>>> they
>>>>> isn't
>>>>> UTF-8 files?
>>>>>
>>>>>
>>>>> On Fri, May 8, 2015 at 8:25 PM, Jim Franz <franz9000@xxxxxxxxx> wrote:
>>>>>
>>>>>   EBCDIC CCSID = 37
>>>>>
>>>>>> Most file imports are via ftp - ccsid 1252, occasionally burned dvd
>>>>>> for
>>>>>>
>>>>>>  new
>>>>>
>>>>>  customer startup of history.
>>>>>> Some trading partners are mainframe, some unix/Linux, some Win, all US
>>>>>> based entities, but we think some servers are overseas (we see time
>>>>>> differences).
>>>>>>
>>>>>> When we write ascii text, usually 819
>>>>>>
>>>>>> what hurts us most is screen input (web interface to SQL Server then
>>>>>> to
>>>>>> Power i) where user cuts & pastes paragraphs of text from their source
>>>>>> systems (thousands of different customers).
>>>>>> Jim
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, May 8, 2015 at 2:07 PM, Henrik Rützou <hr@xxxxxxxxxxxx>
>>>>>> wrote:
>>>>>>
>>>>>>   Jim
>>>>>>
>>>>>>>
>>>>>>> what is the EBCDIC CSSid on your machine and how do you recieve
>>>>>>> files?
>>>>>>>
>>>>>>> On Fri, May 8, 2015 at 8:00 PM, Jim Franz <franz9000@xxxxxxxxx>
>>>>>>> wrote:
>>>>>>>
>>>>>>>   We do a lot of import and export of data, plus have both PC client
>>>>>>>
>>>>>>>>
>>>>>>>>  (local
>>>>>>>
>>>>>>
>>>>>>  and web) input as well at PC5250.
>>>>>>>
>>>>>>>> Had a recent thread involving cut and paste data (ebcdic x'3F') that
>>>>>>>>
>>>>>>>>  caused
>>>>>>>
>>>>>>>  an issue.
>>>>>>>> We use CCSID 37 and ascii 819.
>>>>>>>>
>>>>>>>> There are more EBCDIC characters than what we see on the US
>>>>>>>> Keyboard.
>>>>>>>>
>>>>>>>>  Some
>>>>>>>
>>>>>>>  we need, such as copyright symbol, cents sign, etc, but many
>>>>>>>>
>>>>>>>> We are wanting to take steps to clean the data on input, whether
>>>>>>>> from
>>>>>>>>
>>>>>>>>  ascii
>>>>>>>
>>>>>>>  or ebcdic side. We have some input already cleansed, but only at
>>>>>>>>
>>>>>>>>  screen
>>>>>>>
>>>>>>
>>>>>  program level.
>>>>>>
>>>>>>>
>>>>>>>> Couple questions:
>>>>>>>> 1. Just replacing all below ebcdic x'40'  leaves a lot of strange
>>>>>>>> characters like x'8C' (sort of a moon with a hat..). One thought is
>>>>>>>>
>>>>>>>>  to
>>>>>>>
>>>>>>
>>>>>  identify all the characters we need and replace the rest. No need to
>>>>>>
>>>>>>>
>>>>>>>>  keep
>>>>>>>
>>>>>>
>>>>>>  line and page formatting stuff.
>>>>>>>
>>>>>>>> Is this a good idea?
>>>>>>>>
>>>>>>>> 2. Thinking that since a multitude of entry/update points, db
>>>>>>>>
>>>>>>>>  triggers
>>>>>>>
>>>>>>
>>>>>  are
>>>>>>
>>>>>>>
>>>>>>>  best? Am wondering about apps that write the data, and now after
>>>>>>>>
>>>>>>>>  write,
>>>>>>>
>>>>>>
>>>>>  the
>>>>>>
>>>>>>>
>>>>>>>  screen column data is different than column data in file (trigger
>>>>>>>> pgm
>>>>>>>> cleaned the data - hoping to avoid opening up all the apps.
>>>>>>>>
>>>>>>>> 3. How far do people with heavy edi take this? Am I leaving some
>>>>>>>>
>>>>>>>>  something
>>>>>>>
>>>>>>>  out with the keyboard characters plus a few more? These are names,
>>>>>>>> addresses, notes (which are sometimes pages of notes).
>>>>>>>>
>>>>>>>> Jim Franz
>>>>>>>> --
>>>>>>>> This is the Midrange Systems Technical Discussion (MIDRANGE-L)
>>>>>>>>
>>>>>>>>  mailing
>>>>>>>
>>>>>>
>>>>>  list
>>>>>>
>>>>>>>
>>>>>>>  To post a message email: MIDRANGE-L@xxxxxxxxxxxx
>>>>>>>> To subscribe, unsubscribe, or change list options,
>>>>>>>> visit: http://lists.midrange.com/mailman/listinfo/midrange-l
>>>>>>>> or email: MIDRANGE-L-request@xxxxxxxxxxxx
>>>>>>>> Before posting, please take a moment to review the archives
>>>>>>>> at http://archive.midrange.com/midrange-l.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>> Henrik Rützou
>>>>>>>
>>>>>>>    http://powerEXT.com <http://powerext.com/>
>>>>>>> --
>>>>>>> This is the Midrange Systems Technical Discussion (MIDRANGE-L)
>>>>>>> mailing
>>>>>>>
>>>>>>>  list
>>>>>>
>>>>>>  To post a message email: MIDRANGE-L@xxxxxxxxxxxx
>>>>>>> To subscribe, unsubscribe, or change list options,
>>>>>>> visit: http://lists.midrange.com/mailman/listinfo/midrange-l
>>>>>>> or email: MIDRANGE-L-request@xxxxxxxxxxxx
>>>>>>> Before posting, please take a moment to review the archives
>>>>>>> at http://archive.midrange.com/midrange-l.
>>>>>>>
>>>>>>>
>>>>>>>   --
>>>>>>>
>>>>>> This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing
>>>>>>
>>>>>>  list
>>>>>
>>>>>  To post a message email: MIDRANGE-L@xxxxxxxxxxxx
>>>>>> To subscribe, unsubscribe, or change list options,
>>>>>> visit: http://lists.midrange.com/mailman/listinfo/midrange-l
>>>>>> or email: MIDRANGE-L-request@xxxxxxxxxxxx
>>>>>> Before posting, please take a moment to review the archives
>>>>>> at http://archive.midrange.com/midrange-l.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> --
>>>>> Regards,
>>>>> Henrik Rützou
>>>>>
>>>>>    http://powerEXT.com <http://powerext.com/>
>>>>> --
>>>>> This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing
>>>>> list
>>>>> To post a message email: MIDRANGE-L@xxxxxxxxxxxx
>>>>> To subscribe, unsubscribe, or change list options,
>>>>> visit: http://lists.midrange.com/mailman/listinfo/midrange-l
>>>>> or email: MIDRANGE-L-request@xxxxxxxxxxxx
>>>>> Before posting, please take a moment to review the archives
>>>>> at http://archive.midrange.com/midrange-l.
>>>>>
>>>>>
>>>>>
>>>>>  --
>>> This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing
>>> list
>>> To post a message email: MIDRANGE-L@xxxxxxxxxxxx
>>> To subscribe, unsubscribe, or change list options,
>>> visit: http://lists.midrange.com/mailman/listinfo/midrange-l
>>> or email: MIDRANGE-L-request@xxxxxxxxxxxx
>>> Before posting, please take a moment to review the archives
>>> at http://archive.midrange.com/midrange-l.
>>>
>>>
>>>
>>
>>
> --
> This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
> To post a message email: MIDRANGE-L@xxxxxxxxxxxx
> To subscribe, unsubscribe, or change list options,
> visit: http://lists.midrange.com/mailman/listinfo/midrange-l
> or email: MIDRANGE-L-request@xxxxxxxxxxxx
> Before posting, please take a moment to review the archives
> at http://archive.midrange.com/midrange-l.
>
>