Re: Locate duplicate accounts in a file -- MIDRANGE-L

Thanks Chuck. This would have made dealing with the errors easier. But,
there were only five duplicates, and they were dealt with already. The
record number would have made it less tedious.

John McKee

On Wed, Feb 25, 2015 at 9:56 AM, CRPence <CRPbottle@xxxxxxxxx> wrote:

On 24-Feb-2015 14:19 -0600, John McKee wrote:

I have a file that is supposed to have one record per account
number. There is at least one duplicate account in the file.

Either a PRIMARY KEY constraint or UNIQUE KEY constraint would be
beneficial to be preventive of that negative effect. Otherwise a
non-constraint keyed access path could protect the integrity of the data as
well, either a UNIQUE INDEX with SQL DDL or similarly a DDS file-level
UNIQUE keyword in the PF or a LF.

I'm thinking there is a simple(?) way to locate the duplicates with

SQL.

At least somewhat dependent on what is the definition of /locate/ being
alluded, but probably still /simple/ irrespective that definition.

But, I am missing something. I have:

select siacct, count(siacct)
from spbsobdp
group by siacct
order by siacct

The above query really intends to count the rows, not the values of
SIACCT. Very often the coding of COUNT(field_name) is seen being coded,
where clearly the intent is COUNT(*). Failing to code COUNT(*) where
intended possibly can lead to unexpected results if the file definition
and\or the data changes, and cause poorer performance due to the query
having to perform more work than necessary; in any case, avoid counting
_values_ of a column when actual intent is to count rows.

Which produces a nice list. How can I change the output so only

those records with count > 1 show?

Add the HAVING clause to the existing query to see the list of duplicate
values [however the detail /records/ having that aggregation will still not
be output by that query]; e.g.:
HAVING count(*) > 1

My first thought is to redirect to a file and then query that. Is
there another option?

Unfortunately that single query even with the HAVING clause is not
necessarily all that helpful, directly, because the actual RRN of each of
the rows and the other field values for the row with the duplicated keys
are not revealed in the result-set. Knowing only the duplicate key values
requires probing the data again, to discover those other details. The
following query can directly reveal the entire row along with RRN, and
would be formatted with the conspicuous intent to display in a report to
visually emphasize both the RRN and the Key-value, with the entire row
appearing beyond the colon character; the possibly differing values of the
fields across the rows sharing the same\duplicate key value, aligned
according to the report writer, from which the ability to /see/ which row
has the more desirable values than another row is likely:

select rrn(a), a.siacct, char(':'), a.*
from spbsobdp a
where a.siacct in ( select s.siacct
from spbsobdp s
group by s.siacct
having count(*) > 1
)
order by siacct

--
Regards, Chuck
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.