Re: SQL: how to speed up join tiny file to giant file -- MIDRANGE-L

On 13-Aug-2015 10:47 -0600, Stone, Joel wrote:

I have a filePROD with 10 million rows, and a fileFIX with 100 rows
(both files are identical columns).

I want to join the two files, and update filePROD with fields from
fileFIX where the primary key matches.

Is that a PRIMARY KEY CONSTRAINT or simply a keyed access path? The former has a larger Page Size preferred by the SQL Query Engine (SQE)

Unfortunately SQL on v5r4 reads thru the entire 10 million records
to locate the 100 to join on.

Running SQE or the Classic Query Engine (CQE)? Presumably the latter; read the debug messages for a description about the implementation and what AccPth(s) were used, or not, and if not, then for what reason(s) they were unused for implementation of the query.

Is it possible to convince SQL to read sequentially thru the tiny
fileFIX with 100 records (instead of the giant file)?

Probably could, irrespective of the engine being used. But given the key-matching request, the likely scenario is that the AccPth will be utilized; there would be little reason to build an access path over the sequentially read data given one already exists. And reading via the keyed access path for such a small number of rows is not really an issue, especially given index-only-access should be possible [but possibly not done if the overall key size is much greater than the one column]. Seems the long-running query is a side effect of failing to use an index on the larger file; i.e. what the query does with the smaller file is probably immaterial.

The following SQL ran for an hour - if I could force it to read the
fileFIX first, it should only take seconds.

Any ideas?

Thanks in advance!

select * from filePROD
where exists
( select docid
from fileFIX
where filePROD.docid = fileFIX.docid
)

If the query ran that long, seems FILEPROD is getting read sequentially.? Again, the debug messages would assist. Do the data type and lengths actually match across the columns and the access paths?

Not that there should be any negative consequence, but given the subquery requires no data, selecting a literal versus a column is best; the optimizer should understand that, and implement appropriately, but no reason to give any false requirements.

A clause asking to OPTIMIZE FOR 10 ROWS should encourage the use of an index over sequential access. As well might an ORDER BY filePROD.docid such that the matching Sort Sequence between the request and the access path could recognize the indexed data selected would already be in the required order for the result set.

This mailing list archive is Copyright 1997-2025 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.