On Wed, Jul 12, 2017 at 4:57 AM, Vernon Hamberg
<vhamberg@xxxxxxxxxxxxxxx> wrote:
My main question was, do I understand what is happening, and is there some
way to know the byte= position where the problem occurred.
Well, I don't know if any of us can really know if you understand what
is happening. ;) But as a rule, there isn't a way to know the exact
byte position when you're working with variable-length characters
(such as UTF-8). At least, I don't believe there is an *easy* way.
You could, in principle, calculate the byte position yourself, much
the way applications can calculate the pixel width of a line of
proportionally spaced text. But it's tedious, and not recommended.
Earlier, you said:
"I'm aware of - and would like to use - the SQL XML functionality,
which does NOT seem to have these problems - I've tried a few bits.
Problem is, it'd be a rewrite of this program."
If I may ask, why didn't you go with SQL to start with?
I feel for you, but at this point, I think Nathan's suggestion (or a
variation thereof) is probably the best you can do. Or maybe even the
SQL rewrite.
I don't know your situation (like what you're allowed to use, not to
mention more nitty-gritty details of your problem), but just from the
sound of it, if I were working on your problem, I would totally use
Python (from IBM's 5733-OPS). Even if only to do the filtering step,
and passing along the cleaned-up stream to your existing program. It
wouldn't involve banning certain hex ranges; you could (and should)
work at the character level. Whenever you encounter a *character* that
cannot be safely converted to the encoding that you need (sounds like
UCS-2?) you can simply omit it or replace it in the *character*
stream. No fiddling with raw bytes yourself.
I suspect other languages available on the i may be as strong as
Python when it comes to stream handling and character encodings. I
would imagine Java is, though I don't know it well enough to say for
sure. I understand you've already invested a good deal of time and
effort into this task, and now may not be the time to learn something
completely new, just to finish the task. But then again, maybe it is.
(I could give you a crash course on stream file handling in Python.)
At the very least, it's something to keep in mind for the future.
John Y.
As an Amazon Associate we earn from qualifying purchases.