|
hey guys, it seems someone at iseries News mag is listening in.... this just in from the Club Tech iSeries Programming Tips Newsletter: MYSTERY OF THE QUERY OPTIMIZER AND AN EXISTING INDEX Q: I have a query that sometimes uses an existing index and at other times builds an access path. I'm querying a table of 50 million records, so you can well imagine that the query that builds an access path takes a LONG time. The query that uses an existing index is very fast, on the other hand. For example, consider the following query: Select * From MyTable Where SalesDate >= 20010701 and SalesDate <= 20010731 and SalesAmt >= 1000.00 and SalesAmt <= 2000.00 Order by SalesDate This query selects sales history records where the amount of the sale was between $1,000 and $2,000 (inclusive) and the sale occurred in the month of July in 2001. The query is presented in date order. This query uses an index comprised of SalesDate, SalesRep, and SalesAmt and runs very fast. Now, consider the following query: Select * From MyTable Where SalesDate >= 20010701 and SalesDate <= 20010930 and SalesAmt >= 1000.00 and SalesAmt <= 2000.00 Order by SalesDate This query is identical to the first except that the span of time is a three-month period beginning with July 2001. Why does the first query use the existing index, yet the second one doesn't? A: As you probably know, the iSeries uses a query optimizer whose goal is to execute queries in the timeliest fashion. Your mystery is most likely explained by the key range estimation feature of the query optimizer. The optimizer examines the range of values specified for key fields. Based on your database characteristics and the specific query request, the optimizer predicts whether it is more efficient to use the index or to use some other method. For example, the optimizer could choose to perform a table scan (read the entire table) or to build an access path. The optimizer is deciding that the range of possible key values in your second query is large enough that it is more efficient to build an access path (based on the specifics of the query) than it is to read through the index and then access the dataspace that the selected index entries reference. Your query specifies that all fields should be returned (Select *). This influences the optimizer's decision to not use the existing index. Because all fields are to be retrieved, the optimizer knows that whether or not the system uses the existing index, the system must access the dataspace to retrieve those fields that are not included in the index's key fields. Given the range of key values and the fact that the dataspace must be accessed, the optimizer chooses not to use the existing index. Depending on your requirements, you may be able to force the query to use the existing access path, however. Consider the following query: Select SalesDate, SalesAmt From MyTable Where SalesDate >= 20010701 and SalesDate <= 20010930 and SalesAmt >= 1000.00 and SalesAmt <= 2000.00 Order by SalesDate When you run this query, you'll likely find that though the range of dates is identical to the slow-running query that foregoes use of the existing index, this query uses the index and completes very quickly. Because this query selects only fields that exist in the list of key fields specified by the index, the system can retrieve all required data from the index without the need to access the dataspace. Hence, the optimizer chooses to use the index and all is well. If you need fields that are not included in the list of key fields and you're running the query as part of an application (rather than interactive SQL), you can try a technique that MIGHT prove beneficial. Add a field to the end of your list of key fields that will uniquely identify each record (if necessary, add this unique identifier to the table). Then, issue the following query from your application: Select MyUniqueIDField, From MyTable Where SalesDate >= 20010701 and SalesDate <= 20010930 and SalesAmt >= 1000.00 and SalesAmt <= 2000.00 Order by SalesDate Your application can then cycle through the returned unique identifiers (MyUniqueIDField) and perform a random I/O operation (such as RPG's Chain) to retrieve all necessary fields. I realize this is a two-phase approach, but the overall time to obtain the information you need might be less than that required to build an access path and extract the data.
As an Amazon Associate we earn from qualifying purchases.
This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].
Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.