On 23 Jan 2013 11:25, Stone, Joel wrote:
OK allow me to explain:
I was tasked with sorting AND performing data reduction on a work
file, then passing to another developer.
I would like to sort WORKFILE1 by CUST# and ITEM#, and eliminate
duplicate CUST# + ITEM# combinations.
What determines which rows to eliminate? That is, of those rows that
are duplicate across the columns CUST#,ITEM# but are not duplicate also
across all other columns? Or are all rows that are duplicates across
CUST#,ITEM# always going to be duplicate across all columns of the row?
Can this be done in one SQL statement with the result placed back
in WORKFILE1?
Other than SQL CALL to do the work in multiple statements [and maybe
CL commands], probably not. Are any solutions that require more than
one SQL statement not an option? If not, can one presume a script,
object [e.g. program or QMQRY], or routine created prior can be invoked
to do the work; i.e. the SQL CALL would be the one SQL statement?
Or should I create a new table QTEMP/WORKFILE2 using DISTINCT or
GROUP BY, and then CPYF back to WORKFILE1.
While that describes just one SQL statement, that is more than just
one SQL statement. If doing those multiple steps is an option, then
would doing more than one SQL statement be an option to effect the desired?
Again, from the given, it is not clear that DISTINCT [or GROUP BY to
effect DISTINCT] processing will be effective.
Or is there a totally different and better method to accomplish?
Must the original file and member of WORKFILE1 be maintained and only
the data changed, for example to maintain continuous logging\journal
[i.e. without losing the JID], or is that an artificial requirement?
Note: Seems a newer reply in the thread suggests the basis for the "same
file" may be primarily about authority which may or may not include the
ownership?
If the original work file must remain the repository of the final
data, then the following script should suffice as all SQL, given the
assumptions hold true:
set option commit=*chg -- effected according to environment
; -- script assumes workfile1 is already journaled
; -- assumes workfile1 has no deleted records; RGZPFM can ensure
declare global temporary table worktemp1 as
( select * from workfile1 ) with data
;
delete from workfile1
-- although deleted records, they are pending, so no gaps
;
insert into workfile1
select * from worktemp1
where /* deduplicate logic goes here */
order by /* collated according to SrtSeq */
;
drop table session/worktemp1
;
commit
; -- optionally RGZPFM to compress the deleted records
As an Amazon Associate we earn from qualifying purchases.