Re: REGEXP_INSTR range -- MIDRANGE-L

Thanks Arnie. I did actually try both x'3f' and x'3F' and neither one worked.

I think \xhh is just hex. A byte is an 8-bit number, with values from 0-255; hex is just a convenient way of writing those numbers. Each 4 bits is a number from 0-15, which in hex is 0-F (0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F). Put the hex values for the two 4-bit numbers together and you've described the value of a single 8-bit byte.

But you are correct in that I'm dealing with EBCDIC, and for my purposes, I want to identify any byte that has a values less than x'40' (decimal 64), and that's the range I'm specifying (I think) with [\x00-\x3F]. In EBCDIC, those are control characters and other non-printable characters.

You did inspire me to test a little more thoroughly. Here's what I tried:

values regexp_instr('abcdef-ghijk' || x'3f', '[\x00-\x3f]') returns 7, which is what I expected.

values regexp_instr('- - - - - - - - - - -', '[\x00-\x3F]') returns 1, which is NOT what I expected.

And in case you were wondering,

values hex('- - - - - - - - - - -')

VALUES
604060406040604060406040604060406040604060
******** End of data ********

And both of the following return 'false':

values case when x'60' < x'3f' then 'True' else 'false' end

values case when x'60' between x'00' and x'3f'
then 'True' else 'false' end

--
*Peter Dow* /
Dow Software Services, Inc.
909 793-9050
petercdow@xxxxxxxxx <mailto:petercdow@xxxxxxxxx>
pdow@xxxxxxxxxxxxxx <mailto:pdow@xxxxxxxxxxxxxx> /

On 8/15/2021 4:32 AM, Arnie Flangehead wrote:

I think I inadvertently gave you a bum steer. I was thinking about
character collating rather than the underlying hex.

However, I may be able to redeem myself. I think the \xhh is actually ASCII
hex rather than EBCDIC hex.

Reason I say that is that if you look at a conversion table such as this one
<https://www.ibm.com/docs/en/xl-fortran-aix/16.1.0?topic=appendix-ascii-ebcdic-character-sets>
you'll see that, for instance, capital A to G is ASCII hex 41 to 47, and if
you try this in SQL:

with foo(bar) as (
values('ABCDEFG'))
SELECT foo.bar,
regexp_instr(foo.bar,'[\x43-\x44]')
FROM foo

You get position 3, being ASCII hex'43' C, whereas EBCDIC hex'43' is: ä
(lower-case a with umlaut, in case it doesn't show properly in the browser).

If it's any help I have an unprintable character finder UDF I'm happy to
post (gives the hex value, the display value and the position in the
string).

This mailing list archive is Copyright 1997-2026 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.