Re: SQL - Slow Like or Between -- MIDRANGE-L

On 10/24/13 9:59 AM, Michael Ryan wrote:

A continuation of an email that I sent a couple of days ago, and got
some good feedback. I'm returning a result set to RPG or PHP via a
stored procedure.

On further analysis, I need to provide a starting point for record
access. I was using static SQL and a Between clause for each of 10
fields between *LOVAL and *HIVAL. Takes too long to run...like 60
seconds over an 800K record file.

I created individual indexes (EVI's) over each of the 10 possible
columns. <<SNIP>>

For the BETWEEN predicates with other than low-value and high-value on either side of the AND, the better choice is typically a Binary Radix Index rather than an Encoded Vector Index. I believe the former is also probably typically going to be better, even when only one value on either side of the AND is not the low\hi value. Obviously when both the hi-value and lo-value are on either side of the AND, neither type of index is going to be helpful. AFaIK the EVI is likely to provide better effects when using an IN predicate.

Tried dynamic SQL using LIKE, and it's as slow as the initial static
technique. <<SNIP>>

The LIKE predicate is generally the worst possible choice. An exception is if the effect can be at least partially defined by an equivalence predicate on the LEFT() portion of the field data, and that column has a keyed access path defined. But if a table-scan is likely, then enabling [more] parallel processing is probably ideal.

Question (finally): What's the best technique for an SQL-based search
where I return a result set from a search of multiple fields?

Varying characteristics of the database, data, and application could yield significantly different [in]valid responses; i.e. as always, it depends. I had already made some comments in the prior thread, specific to the little bit that was described of the scenario.

Yet the simplest answer is almost always: Ensure that at least one of the predicates can significantly limit the number of rows via an INDEX; i.e. low cardinality. Every predicate that is from lo-value to hi-value should be effectively ignored by the query engine as _unhelpful_ to implement the query [access plan]. When none of the predicates can easily limit the selected data, then the plan will tend toward a full table scan [as when using left\right generic LIKE predicates]. Predicates that reference an indexed column with poor cardinality will be more likely to be selected by the query as the path to the data, if the optimization goal is "first I/O"; i.e. a hint given to the optimizer, that a certain number of rows is ideally retrieved quickly up-front, which implies an keyed access path to the data is probably going to be the better choice.

Establishing foreign\parent keys for the columns should also help. If the low-value and high-value are typically beyond the actual implied but unenforced limits\bounds, then adding CHECK constraints that enforce the actual\intended boundaries could also help. Either type of constraint can allow the SQL query engine to implicitly add that as additional selection, and that may enable better choices by the query implementation.

Given the number of predicates and possible indexes, the optimizer may be reaching a timeout; IIRC that is diagnosed in the access plan messages and debug messages. The Optimize all access paths (OPTALLAP) feature of the OPNQRYF could override that... and I infer that the QAQQINI option OPTIMIZE_STATISTIC_LIMITATION may provide similar for the SQL.

The SQL query in the OP of the prior thread had a FOR READ ONLY [aka FOR FETCH ONLY] clause. The WITH NR and SKIP LOCKED DATA clauses could potentially also play a role, depending on how the data is being used elsewhere and what the requirements are for row-data the application obtains.