On 23 Jan 2013 11:25, Stone, Joel wrote:
OK allow me to explain:

I was tasked with sorting AND performing data reduction on a work
file, then passing to another developer.

I would like to sort WORKFILE1 by CUST# and ITEM#, and eliminate
duplicate CUST# + ITEM# combinations.

What determines which rows to eliminate? That is, of those rows that are duplicate across the columns CUST#,ITEM# but are not duplicate also across all other columns? Or are all rows that are duplicates across CUST#,ITEM# always going to be duplicate across all columns of the row?

Can this be done in one SQL statement with the result placed back

Other than SQL CALL to do the work in multiple statements [and maybe CL commands], probably not. Are any solutions that require more than one SQL statement not an option? If not, can one presume a script, object [e.g. program or QMQRY], or routine created prior can be invoked to do the work; i.e. the SQL CALL would be the one SQL statement?

Or should I create a new table QTEMP/WORKFILE2 using DISTINCT or
GROUP BY, and then CPYF back to WORKFILE1.

While that describes just one SQL statement, that is more than just one SQL statement. If doing those multiple steps is an option, then would doing more than one SQL statement be an option to effect the desired?

Again, from the given, it is not clear that DISTINCT [or GROUP BY to effect DISTINCT] processing will be effective.

Or is there a totally different and better method to accomplish?

Must the original file and member of WORKFILE1 be maintained and only the data changed, for example to maintain continuous logging\journal [i.e. without losing the JID], or is that an artificial requirement? Note: Seems a newer reply in the thread suggests the basis for the "same file" may be primarily about authority which may or may not include the ownership?

If the original work file must remain the repository of the final data, then the following script should suffice as all SQL, given the assumptions hold true:

set option commit=*chg -- effected according to environment
; -- script assumes workfile1 is already journaled
; -- assumes workfile1 has no deleted records; RGZPFM can ensure
declare global temporary table worktemp1 as
( select * from workfile1 ) with data
delete from workfile1
-- although deleted records, they are pending, so no gaps
insert into workfile1
select * from worktemp1
where /* deduplicate logic goes here */
order by /* collated according to SrtSeq */
drop table session/worktemp1
; -- optionally RGZPFM to compress the deleted records

This thread ...

Return to Archive home page | Return to MIDRANGE.COM home page