Re: Suggestions for finding unique values in million+ records -- MIDRANGE-L

Actually,

As I think about it, SELECT DISTINCT is probably going to be the winner...

With the SELECT DISTINCT, the DB will know that the answer can be
directly provided by the EVI index I suggested. With just a SELECT,
the DB could look at the EVI but then would have to expand the result
set to cover the number of rows in the file.

So use the EVI indexes and SELECT DINSTINCT and UNION DISTINCT (note
UNION == UNION DISTINCT != UNION ALL)

Charles

On Tue, Dec 15, 2009 at 8:49 AM, Charles Wilt <charles.wilt@xxxxxxxxx> wrote:

In my post I deliberately did not specify SELECT DISTINCT...

I'm thinking you'd forcing the DB to do more work, finding the
distinct values for each field separately then again as a whole group.
The UNION DISTINCT is all you really need.

On the other hand, divide and concur has often proven helpful. So I'd
suggest benchmarking both ways.

HTH,
Charles

On Tue, Dec 15, 2009 at 8:23 AM, <Michael_Schutte@xxxxxxxxxxxx> wrote:

SELECT DISTINCT FIELD1 FROM FILE
UNION DISTINCT
SELECT DISTINCT FIELD2 FROM FILE
UNION DISTINCT
SELECT DISTINCT FIELD3 FROM FILE

Untested, since there's a union distinct, you might not need the select
distinct on the second and third selects. As long as you have separate
indexes over the three fields, then this should run fast enough.

--

Michael Schutte
Admin Professional

Bob Evans Holiday Farmhouse Feast, Serves 6-8 l $74.99
A complete homestyle meal TO GO, ready to heat at home, serve & enjoy!
Perfect for Thanksgiving, Christmas or holiday entertaining.
For more information, visit www.FarmhouseFeast.com

midrange-l-bounces@xxxxxxxxxxxx wrote on 12/14/2009 05:37:16 PM:

I have a file with over a million records.
The file has three fields that can contain various values (yes the values
can be in any of the three fields)
Example:
Record 1 Field 1 = ABC Field 2 = ABC Field 3 = 123
Record 2 Field 1 = Field 2 = XYZ Field 3 =
Record 3 Field 1 = 123 Field 2= ABC Field 3 =
Record 4 Field 1 = 456 Field 2 = Field 3 =

I need to display a list of unique values from the combined three fields
such as this for the 4 record example above:
Blank
ABC
XYZ
123
456

This file will have over a million records in it
and the resulting list will usually have less then 500 unique values

I am trying to determine what is the best way to get the list of unique
values into a list (in an interactive job) to be displayed to the user.

I am pretty sure reading a million + records and looking up every value

in

an array and having the array only contain values that are unique will be
VERY SLOW

I thought about keeping a secondary file containing a separate list of

the

unique values as they are entered into the primary file. But then I have

to

maintain this file by removing values that are removed from the primary

file

and determining when a value is no longer in the primary file and time to
remove it from the secondary file would become an issue in itself.

Anybody have any suggestions?

Thanks

John

--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing

list

To post a message email: MIDRANGE-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.

--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.