On 10/24/13 9:59 AM, Michael Ryan wrote:
A continuation of an email that I sent a couple of days ago, and got
some good feedback. I'm returning a result set to RPG or PHP via a
On further analysis, I need to provide a starting point for record
access. I was using static SQL and a Between clause for each of 10
fields between *LOVAL and *HIVAL. Takes too long to run...like 60
seconds over an 800K record file.
I created individual indexes (EVI's) over each of the 10 possible
For the BETWEEN predicates with other than low-value and high-value
on either side of the AND, the better choice is typically a Binary Radix
Index rather than an Encoded Vector Index. I believe the former is also
probably typically going to be better, even when only one value on
either side of the AND is not the low\hi value. Obviously when both the
hi-value and lo-value are on either side of the AND, neither type of
index is going to be helpful. AFaIK the EVI is likely to provide better
effects when using an IN predicate.
Tried dynamic SQL using LIKE, and it's as slow as the initial static
The LIKE predicate is generally the worst possible choice. An
exception is if the effect can be at least partially defined by an
equivalence predicate on the LEFT() portion of the field data, and that
column has a keyed access path defined. But if a table-scan is likely,
then enabling [more] parallel processing is probably ideal.
Question (finally): What's the best technique for an SQL-based search
where I return a result set from a search of multiple fields?
Varying characteristics of the database, data, and application could
yield significantly different [in]valid responses; i.e. as always, it
depends. I had already made some comments in the prior thread, specific
to the little bit that was described of the scenario.
Yet the simplest answer is almost always: Ensure that at least one of
the predicates can significantly limit the number of rows via an INDEX;
i.e. low cardinality. Every predicate that is from lo-value to hi-value
should be effectively ignored by the query engine as _unhelpful_ to
implement the query [access plan]. When none of the predicates can
easily limit the selected data, then the plan will tend toward a full
table scan [as when using left\right generic LIKE predicates].
Predicates that reference an indexed column with poor cardinality will
be more likely to be selected by the query as the path to the data, if
the optimization goal is "first I/O"; i.e. a hint given to the
optimizer, that a certain number of rows is ideally retrieved quickly
up-front, which implies an keyed access path to the data is probably
going to be the better choice.
Establishing foreign\parent keys for the columns should also help.
If the low-value and high-value are typically beyond the actual implied
but unenforced limits\bounds, then adding CHECK constraints that enforce
the actual\intended boundaries could also help. Either type of
constraint can allow the SQL query engine to implicitly add that as
additional selection, and that may enable better choices by the query
Given the number of predicates and possible indexes, the optimizer
may be reaching a timeout; IIRC that is diagnosed in the access plan
messages and debug messages. The Optimize all access paths (OPTALLAP)
feature of the OPNQRYF could override that... and I infer that the
QAQQINI option OPTIMIZE_STATISTIC_LIMITATION may provide similar for the
The SQL query in the OP of the prior thread had a FOR READ ONLY [aka
FOR FETCH ONLY] clause. The WITH NR and SKIP LOCKED DATA clauses could
potentially also play a role, depending on how the data is being used
elsewhere and what the requirements are for row-data the application