Re: Interesting results when writing to an IFS file -- RPG400-L

Scott,

there is no such thing as ASCII in JSON - the standard requires:

6. IANA Considerations

The MIME media type for JSON text is application/json.

Type name: application

Subtype name: json

Required parameters: n/a

Optional parameters: n/a

Encoding considerations: 8bit if UTF-8; binary if UTF-16 or UTF-32

JSON may be represented using UTF-8, UTF-16, or UTF-32. When JSON
is written in UTF-8, JSON is 8bit compatible. When JSON is
written in UTF-16 or UTF-32, the binary content-transfer-encoding
must be used.

On Wed, Apr 13, 2011 at 11:45 PM, Scott Klement
<rpg400-l@xxxxxxxxxxxxxxxx>wrote:

Hi Pete,

This is pretty much a classic CCSID problem. Unfortunately, you haven't
provided much information with which to troubleshoot it.

I don't like the way you're calling the open() API... you have this:

c eval fd = open('/tmp/' + %trim(INFile) :
c

O_CREAT+O_TRUNC+O_CODEPAGE+O_WRONLY:

c S_IRWXU+S_IRWXG+S_IROTH: 819)

I don't think that's related to the problem at all, but I'd do it
differently. I don't understand why you're using CCSID 819... I mean,
JSON isn't typically ASCII, it's UTF-8! But I guess this would be the
best you could do if you're still running v4r5 or older.

If you're on a relatively recent release (I think this code is v5r2),
please consider doing something like this:

D UTF8 c const(1208)
D JOB c const(0)
** make sure file doesn't exist
c callp unlink('/tmp/' + %trim(INFile))

** create new file with CCSID 1208
c eval fd = open('/tmp/' + %trim(INFile)
c : O_WRONLY
c + O_TEXTDATA
c + O_CREAT
c + O_EXCL
c + O_CCSID
c + O_TEXT_CREAT
c : S_IRWXU+S_IRWXG+S_IROTH
c : UTF8 : JOB )

This uses CCSIDs instead of codepages, and thus will support UTF-8, it
also forces the proper CCSID instead of making assumptions about the
existing file.

However, your actual problem appears to be a CCSID discrepancy
somewhere. One problem with the curly brace (square bracket as well) is
that it has a different hex value in different versions of EBCDIC.

For example, if your terminal is one CCSID and your job is another, then
the wrong hex code might be written to the file. Maybe your 5250
terminal thinks the curly brace is x'B8'. And maybe in your job CCSID,
it's actually x'D0'. When you type the character, it's x'B8', and it
gets written as such into your source member. When you view it, it
looks correct, because x'B8' looks like a curly brace in your terminal.
But it's wrong, because the job thinks curly brace is x'D0'! In CCSID
37, x'B8' is a 1/2 symbol. So when it goes through translation to CCSID
819, it gets translated to a 1/2 symbol in ASCII, and so when you view
the ASCII file, it looks like a 1/2 symbol.

I don't know if that's really what's happening... I just don't have
that much information. I'm just coming up with a plausible guess.

The discrepancy might have nothing to do with your terminal. It might
be between your source code and the job running the program. Or it
might be elsewhere. What's for sure is that it's wrong somewhere.

Personally, I don't like coding variant characters the way you've done
it. Instead, I'd code them by their UCS-2 Unicode hex values. The
UCS-2 values are always the same, everywhere... the hex value is
consistent around the world. Then, you can ask RPG to translate them
on-the-fly to the job CCSID, which hopefully will then match the job ccsid.

For example, right now you have this:

D lb S 1A Varying
D Inz('{')
D rb S 1A Varying
D Inz('}')

I don't see why these are variables instead of constants. (Do they
change?) and I don't see why they are VARYING (will they ever not be 1
byte long?) but above all, I wouldn't type the human-readable EBCDIC
characters. I'd use a Unicode hex value. Like this:

D lb c u'007b'
D rb c u'007d'

To clarify: I would not do this purely for the lb/rb constants, I'd do
it for all of them. But I figured one example would be enough to give
you the idea.

Then when you want to assign it to your character string, you can do this:

Line = whatever + %char(lb) + whatever + %char(rb) ;

The %char() BIF will convert the UCS-2 fields into EBCDIC fields at
run-time, and it should match the current job's CCSID. Since the IFS
file is translated from the job's CCSID, this should produce the right
UTF-8 values.

Hope you get the idea.
--
This is the RPG programming on the IBM i / System i (RPG400-L) mailing list
To post a message email: RPG400-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/rpg400-l
or email: RPG400-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/rpg400-l.