× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



Thanks again Tim, Arnie, and John.

I couldn't let this go until I understood it well enough to do something with it. So for anyone else interested, I replicated the implicit cast with

    values hex( cast('grave`concern-_' as dbclob(1k) ccsid 1200) )
    values hex( cast('grave`concern-_' as dbclob(1k) ccsid 13488) )

    0067007200610076006500600063006F006E006300650072006E002D005F
0067007200610076006500600063006F006E006300650072006E002D005F

Note the CCSID - data is being converted from EBCDIC to Unicode, UTF-16, big endian.

From
https://www.ibm.com/docs/en/db2/11.5?topic=uce-code-page-coded-character-set-identifier-ccsid-numbers-unicode-graphic-data

"CCSID 13488 and code page 1200 both refer to UTF-16, and are handled the same way, except for the value of their 'double-byte' (DBCS) space: U+0020 and U+3000 respectively. In a Unicode database, U+3000 has no special meaning.  The same conversion tables are used for both code page 1200 and CCSID 13488."

Pretty much everything I would use is in the x'0000' - x'00ff' range.  See https://www.unicode.org/charts/    ; Basic Latin (ASCII) and Latin-1 Supplement.

Knowing all that, the following REGEXP_INSTR expressions work:

    values regexp_instr( 'grave`concern-_', '\x72')
    returns 2

    values regexp_instr( 'grave`concern-_', '[\x6e-\x6f]')
    returns 8

    values regexp_instr( 'grave`concern-_', '[\x00-\x30]')
    returns 14

and the following return zero, because REGEXP_INSTR is looking at characters, not bytes:

    values regexp_instr( 'grave`concern-_', '[\x00-\x01]')

    values regexp_instr( 'grave`concern-_', '\x00')

--
*Peter Dow* /
Dow Software Services, Inc.
909 793-9050
petercdow@xxxxxxxxx <mailto:petercdow@xxxxxxxxx>
pdow@xxxxxxxxxxxxxx <mailto:pdow@xxxxxxxxxxxxxx> /

On 8/16/2021 1:44 AM, Tim Fathers wrote:
According to the SQL manual, the source string and pattern string are both cast to UTF-16:

"source-string
An expression that specifies the string in which the search is to take place. The
expression must return a value that is a built-in character string, graphic string,
numeric, or datetime data type. If the value is not a UTF-16 DBCLOB, it is
implicitly cast to a UTF-16 DBCLOB before searching for the regular expression
pattern."

Also, according to this https://www.regular-expressions.info/charclass.html "Hyphens at other positions in character classes where they can’t form a range may be interpreted as literals or as errors. Regex flavors are quite inconsistent about this."

Tim.


As an Amazon Associate we earn from qualifying purchases.

This thread ...

Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.