Re: Using an SQL index over DDS file? -- MIDRANGE-L

On 26-Jul-2010 08:10, Loyd Goodbar wrote:

I created an index in our test environment to support a SELECT
DISTINCT query. The file has 4 million rows in test, and 8
million rows in production. The file is DDS-defined.

A lot of rows for DISTINCT processing, but how many unique values across that index? Generally there is little value, often a greater cost, in using an INDEX which is random I\O, when sequential I\O can be done instead much more efficiently over larger data sets.

The index is in test, but Visual Explain says a table scan will
be performed, with the reason "No indexes exist".

Reported "No index" was found, or "No worthwhile index" was found, where the latter intends to suggest no index compatible with the goal of optimizing the query was found. Then each index not valuable for achieving a better implementation [for optimized access] should have been disregarded for a stated reason.

The query asks for the same columns as the index, and in the same
order. DSPDBR reports the index is associated with the base file.
I'm using the iNav "Run SQL Scripts".

Hmmm. One would think an "index only" implementation might eliminate any preference of sequential over random, given either no ORDER BY or run-time collation request that matches the INDEX.

Here is the index and query:

create index prod_mod.iml10026 on prod_mod.imp1001
(i1azcd asc
, i1date asc
, i1time asc
, i1badt asc);

select distinct
i1azcd as location
, i1date as observationdate
, i1time as observationtime
, i1badt as costdate
from prod_mod.imp1001;

The query will be utilized in embedded SQL in an RPGLE program.

Is there anything I can do to influence how the query performs,
to say "use this index"? This would be similar to SQL Server
"hints".

I agree the INDEX seems good for the query; for lack of an ORDER BY, an index is as good as [actually, for index-only, better than] sequential.

Some possible hints are to: Use an ORDER BY where the run-time collation matches those established in the INDEX. Adjust the "Allow copy data" setting, and\or specify the desired OPTIMIZE clause.

Is this a CQE vs. SQE issue?

I believe DISTINCT processing implementations for the two query engines are vastly different, so that probably is a very important issue. Did VE say which?

Are there any gotchas between running the query in Run SQL
Scripts versus an embedded SQLRPGLE, other than naming convention
and date/time formats? All is on V5R4.

The client connection is with an ASCII CCSID, but I do not think that is an issue. I believe that the query runs on the server in EBCDIC, and then only translates into ASCII, what data goes back to the client. That is, I do not believe DISTINCT has any special meaning that would require translation of the data to determine if the row data was actually distinct according to the target CCSID; something that would be relevant only if multiple EBCDIC character values translated into the same ASCII character value, but depending on design, the effect could be considered a presentation issue rather than a functional issue.

The interactive signon effects a subsystem routing program and initial program that may cause the interactive environment to be different than work entering the system by another means. Also the defaults for some run-time optimization variations may be unique to the chosen environment.

The defaults for OPTIMIZE and "allow copy data" may be different between the embedded and from iNav. This is similar to how STRSQL defaults to something like OPTIMIZE FOR FIRST 20 ROWS for output to the display, but embedded or other interfaces might default to OPTIMIZE FOR ALL ROWS. Testing in any environment to verify the statement gives the expected output is safer than testing to verify that the optimization of the statement is as desired; i.e. the expected output remains the same irrespective of optimization, but the optimization is specific to whatever is the run-time environment.

Regards, Chuck