Re: Using SQL to group records with similar elements -- MIDRANGE-L

On 21-Jul-2011 09:37 , Alan Shore wrote:

<<SNIP>>
I have a file that contains (among other things) 21 occurrences of
the same size field that can contain similar data.

For example
<<ed: addendum: V5r4
3-byte alpha fields, containing either blanks or numbers>>

Key f1 f2 f3 f4 f5 f6 ... f21
00001 360 160 202 400 118
00002 160
00003 400 202 600
00004 160 118
00005 202
00006 400
00007 111

Using record key 00001 as a starting point, I need to group all
records that have any of the non-blank values in any of the
occurrences, EXCLUDING those records that have OTHER non-blank
values.

Using the above example:

Keys 00002, 00004, 00005, 00006 would be grouped with 00001
Key 00003 would not (value 600 NOT in Key 00001)
Key 00007 would not (value 111 NOT in Key 00001)

Hopefully, what I am asking makes sense, AND is feasible.
As always, any and all suggestions are welcome.

Since Key=1 is merely "a starting point", presumably there is already some means to find whatever that key value is. Thus also seems specifying that key value as the remaining selection should easily be effected by a variable versus hard-coded in selection as a literal in a permanent object like a VIEW.

The current physical layout of the data values in columns of matching data-type\attributes in each row, suggests that for the more logical set processing effected by the SQL, those values would best be converted to values in a single column across multiple rows; i.e. SQL set logic is geared toward rows, not columns. That can be accomplished by defining a "pivot query". The pivot query can be encapsulated in a VIEW, leaving the selection of the key value as part of the eventual SELECT queries which can utilize that logical view. FWiW, the pivoting of only the data where key=1 should perform very fast; the given sample set producing only up to 7*21 rows would also perform very well.

As rows instead, the sets of those values can be easily processed using EXCEPTION JOIN or equivalent logic. The exception logic includes the rows where any value exists, that is not a subset of the chosen-as superset [the superset row in the physical layout]. Selecting the exceptions means negative logic in selection on the key values of the original file can omit those rows. So any key that includes any one or up to all of the same values as the superset will be selected as a valid subset, because any key that has at least one value not in the superset will be omitted [using NOT on the IN predicate].

That said, while the VIEW in the below script both is sufficiently dynamic to pivot the data and runs quickly, I did not test a two-step query, the first step to build the selection for [or the entire] dynamic query that performs the selection against each column for all values of the key=1; i.e. the method described in the following message\link, and that query by itself runs very quickly. The query which generates that select also runs very quickly, and is included at the bottom of the script.
http://archive.midrange.com/midrange-l/201107/msg00559.html

<code>

create table qtemp/kf (k dec, f1 char(3) default ' '
, f2 char(3) default ' ', f3 char(3) default ' '
, f4 char(3) default ' ', f5 char(3) default ' '
, f6 char(3) default ' ', f7 char(3) default ' '
)
;
insert into qtemp/kf (k,f1,f2,f3,f4,f5,f6) values
(1, 360, 160, 202, 400, 118, default)
,(2, 160, default, default, default, default, default)
,(3, 400, 202, 600, default, default, default)
,(4, 160, 118, default, default, default, default)
,(5, 202, default, default, default, default, default)
,(6, 400, default, default, default, default, default)
,(7, 111, default, default, default, default, default)
;
create unique index qtemp/kfi on qtemp/kf (k)
;
-- following includes c=blank rows for later IN predicate
-- selection. Included because union DISTINCT is defaulted
-- and no WHERE clause to omit them
create view qtemp/kfpivotx (k,c) as
( select k, f1 from kf
union select k, f2 from kf
union select k, f3 from kf
union select k, f4 from kf
union select k, f5 from kf
union select k, f6 from kf
union select k, f7 from kf
/* ...
union select k, f21 from kf */
)
;
-- similar concept to that already reviewed by the OP
-- but identical select as subselect versus literals
-- Joe proposed similar using a CTE to get k=1 row data
select * from kf
where f1 in (select c from kfpivotx where k=1)
and f2 in (select c from kfpivotx where k=1)
and f3 in (select c from kfpivotx where k=1)
and f4 in (select c from kfpivotx where k=1)
and f5 in (select c from kfpivotx where k=1)
and f6 in (select c from kfpivotx where k=1)
and f7 in (select c from kfpivotx where k=1)
/* ...
and f21 in (select c from kfpivotx where k=1) */
/* and k<>1 */
;
-- following excludes any c=blank rows for use by the
-- EXCEPTION JOIN selecting only non-blank values
create view qtemp/kfpivot (k,c) as
( select k, f1 from kf where f1 <>''
union all select k, f2 from kf where f2 <>''
union all select k, f3 from kf where f3 <>''
union all select k, f4 from kf where f4 <>''
union all select k, f5 from kf where f5 <>''
union all select k, f6 from kf where f6 <>''
union all select k, f7 from kf where f7 <>''
/* ...
union all select k, f21 from kf where f21<>'' */
)
;
-- next two queries select the key values from the original
-- PF that are not the exceptions to the subset requirement
select * from kf
where k not in (select l.k
from kfpivot l exception join
(select * from kfpivot where k=1) r
on l.c=r.c )
/* and k<>1 */
;
select * from kf
where k not in (select l.k
from kfpivot l left join
(select * from kfpivot where k=1) r
on l.c=r.c
where r.c is null)
/* and k<>1 */
;
-- build the dynamic query as IN tests against each column per:
-- http://archive.midrange.com/midrange-l/201107/msg00559.html
-- fetch that into a variable, prepare, declare, and open
with
inpred (ip) as
(select replace(
' IN ( "' concat f1 concat '","' concat
f2 concat '","' concat f3 concat '","' concat
f4 concat '","' concat f5 concat '","' concat
f6 concat '","' concat f7 concat /* '","' concat
...
f20 concat '","' concat f21 concat */ '" ) '
, '"', '''' )
from kf
where k=1
)
select 'select * from kf where f1 ' concat ip
concat 'and f2 ' concat ip concat 'and f3 ' concat ip
concat 'and f4 ' concat ip concat 'and f5 ' concat ip
concat 'and f6 ' concat ip concat 'and f7 ' concat ip
/* ...
concat 'and f20' concat ip concat 'and f21' concat ip */
from inpred

</code>

Regards, Chuck