On 26-Jul-2010 08:10, Loyd Goodbar wrote:
I created an index in our test environment to support a SELECT
DISTINCT query. The file has 4 million rows in test, and 8
million rows in production. The file is DDS-defined.
A lot of rows for DISTINCT processing, but how many unique values
across that index? Generally there is little value, often a greater
cost, in using an INDEX which is random I\O, when sequential I\O can
be done instead much more efficiently over larger data sets.
The index is in test, but Visual Explain says a table scan will
be performed, with the reason "No indexes exist".
Reported "No index" was found, or "No worthwhile index" was
found, where the latter intends to suggest no index compatible with
the goal of optimizing the query was found. Then each index not
valuable for achieving a better implementation [for optimized
access] should have been disregarded for a stated reason.
The query asks for the same columns as the index, and in the same
order. DSPDBR reports the index is associated with the base file.
I'm using the iNav "Run SQL Scripts".
Hmmm. One would think an "index only" implementation might
eliminate any preference of sequential over random, given either no
ORDER BY or run-time collation request that matches the INDEX.
Here is the index and query:
create index prod_mod.iml10026 on prod_mod.imp1001
(i1azcd asc
, i1date asc
, i1time asc
, i1badt asc);
select distinct
i1azcd as location
, i1date as observationdate
, i1time as observationtime
, i1badt as costdate
from prod_mod.imp1001;
The query will be utilized in embedded SQL in an RPGLE program.
Is there anything I can do to influence how the query performs,
to say "use this index"? This would be similar to SQL Server
"hints".
I agree the INDEX seems good for the query; for lack of an ORDER
BY, an index is as good as [actually, for index-only, better than]
sequential.
Some possible hints are to: Use an ORDER BY where the run-time
collation matches those established in the INDEX. Adjust the "Allow
copy data" setting, and\or specify the desired OPTIMIZE clause.
Is this a CQE vs. SQE issue?
I believe DISTINCT processing implementations for the two query
engines are vastly different, so that probably is a very important
issue. Did VE say which?
Are there any gotchas between running the query in Run SQL
Scripts versus an embedded SQLRPGLE, other than naming convention
and date/time formats? All is on V5R4.
The client connection is with an ASCII CCSID, but I do not think
that is an issue. I believe that the query runs on the server in
EBCDIC, and then only translates into ASCII, what data goes back to
the client. That is, I do not believe DISTINCT has any special
meaning that would require translation of the data to determine if
the row data was actually distinct according to the target CCSID;
something that would be relevant only if multiple EBCDIC character
values translated into the same ASCII character value, but depending
on design, the effect could be considered a presentation issue
rather than a functional issue.
The interactive signon effects a subsystem routing program and
initial program that may cause the interactive environment to be
different than work entering the system by another means. Also the
defaults for some run-time optimization variations may be unique to
the chosen environment.
The defaults for OPTIMIZE and "allow copy data" may be different
between the embedded and from iNav. This is similar to how STRSQL
defaults to something like OPTIMIZE FOR FIRST 20 ROWS for output to
the display, but embedded or other interfaces might default to
OPTIMIZE FOR ALL ROWS. Testing in any environment to verify the
statement gives the expected output is safer than testing to verify
that the optimization of the statement is as desired; i.e. the
expected output remains the same irrespective of optimization, but
the optimization is specific to whatever is the run-time environment.
Regards, Chuck
As an Amazon Associate we earn from qualifying purchases.