Re: ORDER BY and alternate collating sequences -- MIDRANGE-L

On 27-Mar-2012 05:57 , Joe Pluta wrote:

One thing I'm running into more and more as I integrate IBM i back
ends with the rest of the world is sort sequences. Upper and lower is
one such instance although you can get around that with UPPER at the
cost of some performance. A stickier issue is EBCDIC vs ASCII
collating, and of course specifically digits and non-digits (although
punctuation can be a problem as well).

I'd like to start with simply doing a select with an order by in
STRSQL and get records in, say, a case-insensitive ASCII sequence.
I'd then like to be able to execute that same query via a traditional
ODBC call. However, I'm not sure how to get there. The closest I've
seen is a note on creating an SQL INDEX while the session is set to
SRTSEQ(*LANGIDSHR). This seems to be more for case sensitivity than
ASCII, but I'm willing to learn. Has anyone done this?

For actual [origin as] EBCDIC data, I figure the language Sort Sequences [i.e. the *LANGID specific] collation are probably sufficient. However...

The data could be stored in ASCII\Unicode instead of EBCDIC. I have never utilized the support, but some of the following might be worth a review to determine how to enable and take advantage of an "International Components for Unicode" (ICU) sort sequence table:

_i Connection string keywords - Sort properties i_
http://publib.boulder.ibm.com/infocenter/iseries/v5r4/topic/rzaik/rzaikconnstrkeywordssortprop.htm
"
Use these iSeries Access ODBC driver connection string keywords to change Sort properties of the ODBC connection.
...
"

Search in the IBM i InfoCenter: unicode collation

Results lead to a search on: ICU sequence

_i ICU sort sequence i_
http://publib.boulder.ibm.com/infocenter/iseries/v5r4/topic/sqlp/rbafyicu.htm
"
When an International Components for Unicode (ICU) sort sequence table is used, the database uses the language-specific rules to determine the weight of the data based on the locale of the table.

An ICU sort sequence table named en_us (United States locale) can sort data differently than another ICU table named fr_FR (French locale) for example.

The ICU support (5722-SS1 Option 39) properly handles data that is not normalized, producing the same results as if the data were normalized. The ICU sort sequence table can sort all character, graphic, and Unicode (UTF-8, UTF-16 and UCS-2) data.
...

When an ICU sort sequence table is specified, the performance of SQL statements that use the table can be much slower than the performance of SQL statements that use a non-ICU sort sequence table or use a *HEX sort sequence. The slower performance results from calling the ICU support to get the weighted value for each piece of data that needs to be sorted. An ICU sort sequence table can provide more sorting function but at the cost of slower running SQL statements. However, indexes created with an ICU sort sequence table can be created over columns to help reduce the need of calling the ICU support. In this case, the index key already contains the ICU weighted value, so there is no need to call the ICU support.
"

_i ICU-based sort support i_
http://publib.boulder.ibm.com/infocenter/iseries/v5r4/topic/nls/rbagsicusortsequencesupport.htm
International Components for Unicode (ICU) based sort support ...

_i ICU-based sort sequence support i_
http://publib.boulder.ibm.com/infocenter/iseries/v5r4/topic/nls/rbagsicusortsequencespt.htm
i5/OS® support for ICU sort sequence is provided in i5/OS functions in the Work management and Database management topics.

_i Sort sequences and normalization in SQL i_
http://publib.boulder.ibm.com/infocenter/iseries/v5r4/topic/sqlp/rbafyussisql.htm
"
A sort sequence defines how characters in a character set relate to each other when they are compared or ordered. Normalization allows you to compare strings that contain combining characters.

The sort sequence is used for all character, and UCS-2 and UTF-16 graphic comparisons performed in SQL statements. There are sort sequence tables for both single byte and double byte character data. Each single byte sort sequence table has an associated double-byte sort sequence table, and vice versa. Conversion between the two tables is performed when necessary to implement a query. In addition, the CREATE INDEX statement has the sort sequence (in effect at the time the statement was run) applied to the character columns referred to in the index.
...
"

Regards, Chuck