×
The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.
On 24-Feb-2015 14:19 -0600, John McKee wrote:
I have a file that is supposed to have one record per account
number. There is at least one duplicate account in the file.
Either a PRIMARY KEY constraint or UNIQUE KEY constraint would be
beneficial to be preventive of that negative effect. Otherwise a
non-constraint keyed access path could protect the integrity of the data
as well, either a UNIQUE INDEX with SQL DDL or similarly a DDS
file-level UNIQUE keyword in the PF or a LF.
I'm thinking there is a simple(?) way to locate the duplicates with
SQL.
At least somewhat dependent on what is the definition of /locate/
being alluded, but probably still /simple/ irrespective that definition.
But, I am missing something. I have:
select siacct, count(siacct)
from spbsobdp
group by siacct
order by siacct
The above query really intends to count the rows, not the values of
SIACCT. Very often the coding of COUNT(field_name) is seen being coded,
where clearly the intent is COUNT(*). Failing to code COUNT(*) where
intended possibly can lead to unexpected results if the file definition
and\or the data changes, and cause poorer performance due to the query
having to perform more work than necessary; in any case, avoid counting
_values_ of a column when actual intent is to count rows.
Which produces a nice list. How can I change the output so only
those records with count > 1 show?
Add the HAVING clause to the existing query to see the list of
duplicate values [however the detail /records/ having that aggregation
will still not be output by that query]; e.g.:
HAVING count(*) > 1
My first thought is to redirect to a file and then query that. Is
there another option?
Unfortunately that single query even with the HAVING clause is not
necessarily all that helpful, directly, because the actual RRN of each
of the rows and the other field values for the row with the duplicated
keys are not revealed in the result-set. Knowing only the duplicate key
values requires probing the data again, to discover those other details.
The following query can directly reveal the entire row along with RRN,
and would be formatted with the conspicuous intent to display in a
report to visually emphasize both the RRN and the Key-value, with the
entire row appearing beyond the colon character; the possibly differing
values of the fields across the rows sharing the same\duplicate key
value, aligned according to the report writer, from which the ability to
/see/ which row has the more desirable values than another row is likely:
select rrn(a), a.siacct, char(':'), a.*
from spbsobdp a
where a.siacct in ( select s.siacct
from spbsobdp s
group by s.siacct
having count(*) > 1
)
order by siacct
As an Amazon Associate we earn from qualifying purchases.