× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



On Wed, Jul 12, 2017 at 4:57 AM, Vernon Hamberg
<vhamberg@xxxxxxxxxxxxxxx> wrote:
My main question was, do I understand what is happening, and is there some
way to know the byte= position where the problem occurred.

Well, I don't know if any of us can really know if you understand what
is happening. ;) But as a rule, there isn't a way to know the exact
byte position when you're working with variable-length characters
(such as UTF-8). At least, I don't believe there is an *easy* way.

You could, in principle, calculate the byte position yourself, much
the way applications can calculate the pixel width of a line of
proportionally spaced text. But it's tedious, and not recommended.

Earlier, you said:

"I'm aware of - and would like to use - the SQL XML functionality,
which does NOT seem to have these problems - I've tried a few bits.
Problem is, it'd be a rewrite of this program."

If I may ask, why didn't you go with SQL to start with?

I feel for you, but at this point, I think Nathan's suggestion (or a
variation thereof) is probably the best you can do. Or maybe even the
SQL rewrite.

I don't know your situation (like what you're allowed to use, not to
mention more nitty-gritty details of your problem), but just from the
sound of it, if I were working on your problem, I would totally use
Python (from IBM's 5733-OPS). Even if only to do the filtering step,
and passing along the cleaned-up stream to your existing program. It
wouldn't involve banning certain hex ranges; you could (and should)
work at the character level. Whenever you encounter a *character* that
cannot be safely converted to the encoding that you need (sounds like
UCS-2?) you can simply omit it or replace it in the *character*
stream. No fiddling with raw bytes yourself.

I suspect other languages available on the i may be as strong as
Python when it comes to stream handling and character encodings. I
would imagine Java is, though I don't know it well enough to say for
sure. I understand you've already invested a good deal of time and
effort into this task, and now may not be the time to learn something
completely new, just to finish the task. But then again, maybe it is.
(I could give you a crash course on stream file handling in Python.)
At the very least, it's something to keep in mind for the future.

John Y.

As an Amazon Associate we earn from qualifying purchases.

This thread ...

Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.