Re: Checking Thresholds in a file using SQL -- MIDRANGE-L

Thanks Chuck for a detailed solution. I am sure I can use this.

I have a follow-up question:

The file is a DDS defined multi-member file. It has a logical over it by K1
(not the actual field name by the way), but no index with RRN.

During processing I could build a Table and index on the fly in QTEMP and
copy data to it before running my sql.

Given that the file may have anywhere between a handful to a couple of
million records, and hundreds of such files are being processed in a day
(by different jobs, one file per job), would it make sense (performance
wise) to go the table/index route or process the data directly from the
input file?

Thanks
Vinay

On Tue, Oct 20, 2015 at 5:19 PM, CRPence <crpbottle@xxxxxxxxx> wrote:

On 20-Oct-2015 12:59 -0500, Vinay Gavankar wrote:

I have a requirement where I need to check for number times a value
occurs in a field in the file.

The file has a field K1 (non-unique) and another field F1

For the records with same value of K1, F1 can have different values.
I need to consider only the value of last record (highest RRN). If my
file has 3 records for K1=A (in RRN sequence)

A XXX
A YYY
A ZZZ

then I want to consider only ZZZ for processing.

drop table kf
;
create table kf (k1 char, f1 char(3) not null default '')
;
insert into kf values
('A', 'XXX'), ('A', 'YYY'), ('A', 'ZZZ')
;

The following query returns the above "only ZZZ" result set for the 'A'
values shown above, and the result is of course 'ZZZ':

select P.f1
from kf as P
where P.k1='A'
and RRN(P) = ( select MAX(RRN(S))
from kf as S
where S.k1 = P.k1
)

I need to get a count of unique, non-blank values of F1 in the file
(considering only the last record for K1).

To give an example, if my full file looks like this:

Asterisks delimiting row-data indicates that row has a value of F1 for
which the maximum RRN() is found within the K1 grouping.

A XXX
A YYY
*A ZZZ*
*B XXX*
C MMM
*C ZZZ*
*D XXX*
*E UUU*
*F ZZZ*
G (blanks)
H (blanks)

then my counts should be:

ZZZ 3
XXX 2
UUU 1

drop table kf
;
create table kf (k1 char, f1 char(3) not null default '')
;
insert into kf values
('A', 'XXX'), ('A', 'YYY'), ('A', 'ZZZ')
, ('B', 'XXX'), ('C', 'MMM'), ('C', 'ZZZ')
, ('D', 'XXX'), ('E', 'UUU'), ('F', 'ZZZ')
, ('G', ' '), ('H', ' ')
;

The above described "counts" result-set can be obtained with the
following query:

select P.f1, count(*) as tot
from kf as P
where RRN(P) = ( select MAX(RRN(S))
from kf as S
where S.k1 = P.k1 )
and P.f1<>''
group by P.f1

If the highest count is greater than a certain number (threshold), I

have to take a particular action.

create global variable threshold bigint default 2
;

The following query returns a result-set identifying 'ZZZ' as having
three occurrences because that is the only value for which the count
exceeded the threshold, i.e. 2, as declared above.

select P.f1, count(*) as tot
from kf as P
where RRN(P) = ( select MAX(RRN(S))
from kf as S
where S.k1 = P.k1 )
and P.f1<>''
group by P.f1
having count(*) > threshold
fetch first row only

So basically, I just need the highest count. The value of F1 for
which highest count occurred is not important.

The following obtains as a result-set just the count, but ignores the
requirement for a threshold; the threshold of course, can be compared
against the fetched result.

select count(*) as tot
from kf as P
where RRN(P) = ( select MAX(RRN(S))
from kf as S
where S.k1 = P.k1 )
and P.f1<>''
group by P.f1
order by tot desc
fetch first row only

Or if the actual count is only relevant given the threshold is exceeded,
then the following could be used to return an empty result-set when the
threshold is not exceeded.

select A.tot
from (
select count(*) as tot
from kf as P
where RRN(P) = ( select MAX(RRN(S))
from kf as S
where S.k1 = P.k1 )
and P.f1<>''
group by P.f1
order by tot desc
fetch first row only
) as A
where A.tot > threshold

Is this possible to do with sql (embedded in RPGLE).

Sure. See above [tested on v5r3] for what might function as desired.

Or what would be the most efficient (I/O and CPU wise) way of doing
this?

The following keyed Access Path from the derived INDEX might be useful
if implementing via SQL similar to what I offered above; that is not
available until v6r1:

create index kfx on kf (k1, rrn(kf) desc)

The file actually has 10 fields (F1 thru f10) and the above needs to
be done for all the 10 fields (could be a different threshold value
for all fields).

Not too sure what exactly that means without example data nor DDL to
clarify. So guessing what is meant, perhaps the following; but as shown
extending to only one additional field, yet what can be done for one
additional field, can be repeated for each additional after that one:

drop table kf
;
create table kf
( k1 char
, f1 char(3) not null default ''
, f2 char(3) not null default ''
)
;
insert into kf values
('A', 'XXX', 'LLL')
, ('A', 'YYY', 'KKK')
, ('A', 'ZZZ', 'JJJ')
, ('B', 'XXX', 'KKK')
, ('C', 'MMM', 'PPP')
, ('C', 'ZZZ', 'NNN')
, ('D', 'XXX', ' ')
, ('E', 'UUU', 'JJJ')
, ('F', 'ZZZ', 'NNN')
, ('G', ' ', 'KKK')
, ('H', ' ', 'KKK')
;

Similarly described again, as the previous setup:

• Asterisks delimited row-data indicates that row has a value of F1 for
which the maximum RRN() is found within the K1 grouping.

• Squiggly bracket delimited row-data indicates that row has a value of
F2 for which the maximum RRN() is found within the K1 grouping.

A XXX LLL
A YYY KKK
{*A ZZZ* JJJ}
{*B XXX* KKK}
C MMM PPP
{*C ZZZ* NNN}
*D XXX* blank
{*E UUU* JJJ}
{*F ZZZ* NNN}
{ G blank KKK}
{ H blank KKK}

then the counts should be for each of the columns:

F1 F2
--- ---
ZZZ 3 JJJ 2
XXX 2 NNN 2
UUU 1 KKK 3

So to include only the maximum count from each, that would be:

ZZZ 3 KKK 3

And to obtain only the count because the F# value is not relevant, the
following is the result set:

3 3

The following is a way to obtain the above result set [using something
valid on v5r3], but as row data instead of columns. There may be a way to
use the VALUES row clause and scalar fullselect instead of what is shown
here using effectively the same Nested Table Expressions (NTE) [aka derived
tables] but with an extra column as literal value expression to designate
whence the count value was derived, and the row data collected in a UNION
ALL. As noted, extending the query just requires repeating the same patter
for each additional column.

select F1.* from
( select count(*) as maxcnt
, 'F1' as c
from kf as P
where RRN(P) = ( select MAX(RRN(S)) from kf as S
where S.k1 = P.k1 )
and P.f1<>''
group by P.f1
order by maxcnt desc
fetch first row only ) as F1
union all
select F2.* from
( select count(*) as maxcnt
, 'F2' as c
from kf as P
where RRN(P) = ( select MAX(RRN(S)) from kf as S
where S.k1 = P.k1 )
and P.f2<>''
group by P.f2
order by maxcnt desc
fetch first row only ) as F2
/* additional union all queries precede the order by */
order by c /* or as desired */

Any help will be greatly appreciated.

There may be simpler and\or more succinct ways, but seems the above
might be functional for the given scenario, if I inferred correctly what is
the extended variation... and if row data is acceptable vs columns -- else
revisions would be required, but probably none I could test, limited to
v5r3 capabilities.

--
Regards, Chuck

--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.