RE: LEFT vs SUBSTR -- MIDRANGE-L

Thanks Simon. I didn't realize u-umlaut takes 2 bytes in UTF-8.

I tested the example using a German setup (DEU/DE/273) and both built-ins
returned 'Jü', so I'm still convinced that both built-ins work as designed.

It would have been helpful if IBM documented what CCSID were they in when
they INSERTed the data into that UTF-8 field, as well as what CCSID they
were using when they ran the query and displayed the results. All of these
settings matter in this case.

And in typical IBM fashion they don?t advise any best practices (where to
use LEFT vs SUBSTR or vice versa).

Elvis

Celebrating 11-Years of SQL Performance Excellence on IBM i5/OS and OS/400
www.centerfieldtechnology.com

-----Original Message-----
Subject: Re: LEFT vs SUBSTR

From the InfoCentre under SUBSTR:

"1 The SUBSTR function accepts mixed data strings. However, because
SUBSTR operates on a strict byte-count basis, the result will not
necessarily be a properly formed mixed data string."

What you see is expected behaviour.

LEFT operates on characters. You only specify the number of them.
LEFT works out where they start and end therefore it implicitly
handles multi-byte characters.

SUBSTR operates on bytes. You specify the starting position and the
length in bytes therefore if the length stops in the middle of a
multi-byte character you will get crap returned.

Although u-umlaut appears to be a single character in UTF-8 it is
represented as multiple bytes.

CCSID 37 ü x'DC'
CCSID 819 ü x'FC'
CCSID 1208 ü x'C3BC'

Clear?

Mind you, I don't believe the E-acute is correct.

This mailing list archive is Copyright 1997-2026 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.