Re: Ad-hoc report of duplicates using SQL or Query -- MIDRANGE-L

On 11-Feb-2012 09:59 , Vinay Gavankar wrote:

I have a file with 6 fields say F1 thru F6.

I want to find all the records in the file where field F1 and F2 is
same.

I also want another report which has all the records where Field F1
is same but F2 is different.

I know that I have stated the requirements badly, so let me
illustrate with an example.

If my file has following records denoted by F1,F2,F3,F4,F5,F6
(the combination of F3,F4,F5,F6 is different in all the records):

<<SNIP sample data; F3-F6 not pertinent; replaced by DDL+DML:>>

create table qtemp/m (f1 char(3), f2 char(3), f3 char)
; -- Table M created in QTEMP.
insert into qtemp/m values
('AAA','XXX','1'), ('AAA','YYY','2'), ('BBB','XXX','3')
, ('CCC','XXX','4'), ('DDD','YYY','5'), ('DDD','YYY','6')
, ('EEE','XXX','7'), ('EEE','ZZZ','8'), ('FFF','XXX','9')
, ('FFF','XXX','A'), ('FFF','XXX','B'), ('FFF','ZZZ','C')
; -- 12 rows inserted in M in QTEMP.

My first report will have records:
DDD,YYY,*,*,*,*
DDD,YYY,*,*,*,*
FFF,XXX,*,*,*,*
FFF,XXX,*,*,*,*
FFF,XXX,*,*,*,*

From the above expected output, I infer that "all the records in the file where field F1 and F2 is same" implies:

select rows from the file where (F1,F2) is non-distinct [i.e. duplicate]

If so, then I believe any of the following SELECT statements should produce that unordered result set:

select * from qtemp/m
where exists ( select 1 from qtemp/m s
where m.f1 = s.f1
and m.f2 = s.f2
group by s.f1, s.f2
having count(*)>1 )
;
select * from qtemp/m
where m.f1 in ( select s.f1 from qtemp/m s
group by s.f1 having count(*)>1 )
and m.f2 in ( select s.f2 from qtemp/m s
where s.f1 = m.f1
group by s.f2 having count(*)>1 )
;
select m.*
from qtemp/m m
inner join
( select f1, f2 from qtemp/m
group by f1, f2 having count(*)>1 ) s
on m.f1=s.f1 and m.f2=s.f2
;
/* I can not test this on v5r3; should work since v5r4? */
select * from qtemp/m
where (m.f1, m.f2) in ( select s.f1, s.f2 from qtemp/m s
group by s.f1, s.f2
having count(*)>1 )
; -- The output from the above SELECT statements:
....+....1..
F1 F2 F3
DDD YYY 5
DDD YYY 6
FFF XXX 9
FFF XXX A
FFF XXX B
******** End of data ********
;

My second report will have records:
AAA,XXX,*,*,*,*
AAA,YYY,*,*,*,*
EEE,XXX,*,*,*,*
EEE,ZZZ,*,*,*,*
FFF,XXX,*,*,*,*
FFF,XXX,*,*,*,*
FFF,XXX,*,*,*,*
FFF,ZZZ,*,*,*,*

From the above expected output, I infer that "all the records where Field F1 is same but F2 is different" implies:

select rows from the file where (F1) is non-distinct [i.e. duplicate] *and* (F2) has distinct values across that duplicate value of F1.

If so, then I believe the following SELECT statement should produce that unordered result set:

select * from qtemp/m
where m.f1 in ( select s.f1 from qtemp/m s
group by s.f1 having count(*)>1 )
and exists ( select 1 from qtemp/m s
where s.f1 = m.f1
having count(distinct s.f2)>1 )
; -- The output from the above SELECT statement:
....+....1..
F1 F2 F3
AAA XXX 1
AAA YYY 2
EEE XXX 7
EEE ZZZ 8
FFF XXX 9
FFF XXX A
FFF XXX B
FFF ZZZ C
******** End of data ********
;

To further clarify, F1 is something called PA#, F2 is a Member Id.
In the file, PA# was "supposed" to be unique, but it is not.
So I want to identify
all records where same PA# is repeated for the Same member
and in another report,
where same PA# is repeated for different members.

Is it possible using only SQL and/or query. Creating a temporary
intermediate file is OK too.

Any help will be really appreciated.

Note: There was no consideration for the NULL value in any of the above.

Regards, Chuck