Re: Remove duplicate records -- MIDRANGE-L

I think I see the logic to do this with a relatively simple RPG program. I have managed to create a base file that has ACCT, CYC#, DATE, TIME, and poster id. Seems like I have to do is read a record from this file, chain to the multiformat logical, check which record format was read, check the poster id (not in an index) and, if it matches, delete the record. Seems too simple. Then, when done, repost the original records.

I had wondered if SQL could do that to remove the uncertainty of a one shot program.

John McKee

-----Original message-----
From: CRPence CRPbottle@xxxxxxxxx
Date: Wed, 23 Sep 2009 11:56:07 -0500
To: midrange-l@xxxxxxxxxxxx
Subject: Re: Remove duplicate records

Data cleansing for that case should not require the original
file, from prior to those extraneous rows being inserted, if the
rules are explicit for the condition of duplicate rows; i.e.
duplicates being incorrect. With such a rule, a cleansing request
could fix any other similar incidents which had the same effect of
duplicate rows being posted. The /testing/ of the delete is
accomplished easily using isolation; i.e. just ROLLBACK if the wrong
rows were selected for the delete, where the journal or record locks
can be used to easily confirm. For any one account, that would
simply require adding the appropriate selection to the WHERE on the
DELETE; e.g. WHERE ACCOUNT='TEST'.

FWiW: Posted as a script versus words, someone could probably
post a SQL script in response which would effect easy correction.
As written, too much is left to the imagination. For example, a
script like the following might describe the scenario, yet this
specific script might instead just show how incorrectly one might
infer what was actually meant to be expressed in /words/ vs had the
details been scripted:

create table HF (HN INT, ACC INT
, DAT INT, TIM INT, IDP CHAR
, TXT CHAR(30)
, primary key (HN, ACC) )
;
create table TF (HN INT, ACC INT
, DAT INT, TIM INT, IDP CHAR
, TXN CHAR(30)
/* no unique key to prevent dup */ )
;
create view JLF (select H.DAT, H.TIM, H.IDP, H.TXT
, T.TXN
from HF H
inner join TF T
on (HN, ACC)
)
;
/* The error case, for one account: */
insert into HF values(7, 77, 090916, 111111, 'A', 'Oops')
;
insert into TF values
(7, 77, 090916, 111111, 'A', 'Oops')
,(7, 77, 090916, 111111, 'A', 'Oops')
,(7, 77, 090916, 111111, 'A', 'Oops')
;

Regards, Chuck

jmmckee wrote:

I made a mistake today. Accidentally ran a posting program three
additional times. <<SNIP words describing scenario>>

Can SQL be used to identify when there are multiple records for
the same hospital and account number that have identical date,
time, and poster id?

Can SQL be used to remove three of the four multiples?

Can SQL be used to do the previous two operations, but for one
specific account - for testing?

Would it be easier if the original posting file was available? My
assumption is yes. and that file can be available.

<<SNIP>>

--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.