× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.




Hi Ashish,

If you have enough control over your XML generation, Thorbjrøn's suggestion makes sense and should pretty much work anywhere.

AFAIK, CCSID is a double byte (only) set, so the triple bytes you see are, I believe, artifacts of UTF-8 encoding. That occurs at hex values above 07FF. So, you could change the encoding to UTF-16. But that's only part of the story. Other parts are the encoding you use to save the file and the tool you use to read it.

I ran into an issue the other day that gave me fits and renewed my appreciation of what Java does for you. It was pretty simple: a straightforward HTML error page for Apache that included French. I got the famous boxes and question marks, even though I specified encoding in UTF-8. The base problem was that Windows WordPad defaulted to system encoding (1252 I think.) I tried saving as Unicode, but WordPad uses BOM and the browsers didn't like it. I could have found a tool that would save it properly, but people down the road might not have it, so I owned up to my red face and changed the encoding to ISO 8859-1, which worked for the French characters. With Java in between, I never would have seen the issue at all.

So, I believe the moral is: if you're using other than default encoding on your box, be sure the tools you use are capable of saving and reading the encodings. HTH,


Joe Sam

Joe Sam Shirah - http://www.conceptgo.com
conceptGO - Consulting/Development/Outsourcing
Java Filter Forum: http://www.ibm.com/developerworks/java/
Just the JDBC FAQs: http://www.jguru.com/faq/JDBC
Going International? http://www.jguru.com/faq/I18N
Que Java400? http://www.jguru.com/faq/Java400

----- Original Message ----- From: "Ashish Kulkarni" <kulkarni_ash1312@xxxxxxxxx>
To: <java400-l@xxxxxxxxxxxx>
Sent: Wednesday, July 30, 2008 10:05 AM
Subject: XML file and Japanese characters


Hi
Has any one worked with creating a XML file from database which has Japanese database which has 3 byte characters.
The AS400 file is created with CCSID 5026, i need to get data from this file and create XML file, which will be send to other program
Currently the issue is when i create XML file with UTF-8 these japanese characters become some thing unreadable
So how do i convert these characters to readable UTF-8? or do i have to create XML file with some other encoding.
Any ideas, has anyone worked with project where you are need to get data from non English database into a XML file

Ashish



--
This is the Java Programming on and around the iSeries / AS400 (JAVA400-L) mailing list
To post a message email: JAVA400-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/java400-l
or email: JAVA400-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/java400-l.



As an Amazon Associate we earn from qualifying purchases.

This thread ...

Follow-Ups:
Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.