On 27-Mar-2012 05:57 , Joe Pluta wrote:
One thing I'm running into more and more as I integrate IBM i back
ends with the rest of the world is sort sequences. Upper and lower is
one such instance although you can get around that with UPPER at the
cost of some performance. A stickier issue is EBCDIC vs ASCII
collating, and of course specifically digits and non-digits (although
punctuation can be a problem as well).
I'd like to start with simply doing a select with an order by in
STRSQL and get records in, say, a case-insensitive ASCII sequence.
I'd then like to be able to execute that same query via a traditional
ODBC call. However, I'm not sure how to get there. The closest I've
seen is a note on creating an SQL INDEX while the session is set to
SRTSEQ(*LANGIDSHR). This seems to be more for case sensitivity than
ASCII, but I'm willing to learn. Has anyone done this?
For actual [origin as] EBCDIC data, I figure the language Sort
Sequences [i.e. the *LANGID specific] collation are probably sufficient.
The data could be stored in ASCII\Unicode instead of EBCDIC. I have
never utilized the support, but some of the following might be worth a
review to determine how to enable and take advantage of an
"International Components for Unicode" (ICU) sort sequence table:
_i Connection string keywords - Sort properties i_
Use these iSeries Access ODBC driver connection string keywords to
change Sort properties of the ODBC connection.
Search in the IBM i InfoCenter: unicode collation
Results lead to a search on: ICU sequence
_i ICU sort sequence i_
When an International Components for Unicode (ICU) sort sequence table
is used, the database uses the language-specific rules to determine the
weight of the data based on the locale of the table.
An ICU sort sequence table named en_us (United States locale) can sort
data differently than another ICU table named fr_FR (French locale) for
The ICU support (5722-SS1 Option 39) properly handles data that is not
normalized, producing the same results as if the data were normalized.
The ICU sort sequence table can sort all character, graphic, and Unicode
(UTF-8, UTF-16 and UCS-2) data.
When an ICU sort sequence table is specified, the performance of SQL
statements that use the table can be much slower than the performance of
SQL statements that use a non-ICU sort sequence table or use a *HEX sort
sequence. The slower performance results from calling the ICU support to
get the weighted value for each piece of data that needs to be sorted.
An ICU sort sequence table can provide more sorting function but at the
cost of slower running SQL statements. However, indexes created with an
ICU sort sequence table can be created over columns to help reduce the
need of calling the ICU support. In this case, the index key already
contains the ICU weighted value, so there is no need to call the ICU
_i ICU-based sort support i_
International Components for Unicode (ICU) based sort support ...
_i ICU-based sort sequence support i_
i5/OS® support for ICU sort sequence is provided in i5/OS functions in
the Work management and Database management topics.
_i Sort sequences and normalization in SQL i_
A sort sequence defines how characters in a character set relate to each
other when they are compared or ordered. Normalization allows you to
compare strings that contain combining characters.
The sort sequence is used for all character, and UCS-2 and UTF-16
graphic comparisons performed in SQL statements. There are sort sequence
tables for both single byte and double byte character data. Each single
byte sort sequence table has an associated double-byte sort sequence
table, and vice versa. Conversion between the two tables is performed
when necessary to implement a query. In addition, the CREATE INDEX
statement has the sort sequence (in effect at the time the statement was
run) applied to the character columns referred to in the index.