Re: improve the performance of a query by using multiple threads in a java stored procedure -- JAVA400-L

Do you have SMP installed and turned on? That will take your queries and do the multithreading for you. See:

http://www-03.ibm.com/systems/i/software/db2/db2sym.html

You need to have your system set up with at least multiple virtual processors if not multiple real processors in order for it to really shine.

Consider this class. Mike Cain out of Rochester is one of IBMs DB2 gurus..

http://public.dhe.ibm.com/partnerworld/pub/pdf/courses/4aea.pdf

Jim Oberholtzer
CEO/Chief Technical Architect
Agile Technology Architects, LLC

On 3/31/2010 8:46 PM, Yan Zhang wrote:

*This message was transferred with a trial version of CommuniGate(r) Pro*
I have a question about how to improve the performance of a SQL query by
using multiple threads/connections.

Here is my query:

select A1.aID,D1.dID ,
decimal(count(distinct case when(tdate between date('2009-10-01') and
date('2010-01-31') ) THEN
f1.dist_acct else null end)) count1,
decimal(count(distinct case when(tdate between date('2008-10-01') and
date('2009-01-31') ) THEN
f1.dist_acct else null end)) as count2
from fctsales f1,dimtime t1,dimaccts a1,dimdist D1,dimitem I1
where t1.date = f1.date and a1.dist_acct = f1.dist_acct and D1.DIST =
f1.DIST and I1.ITEM = f1.ITEM
and ((t1.tdate between date('2009-10-01') and date('2010-01-31')) or
(t1.tdate between date('2008-10-01') and date('2009-01-31')))
group by A1.aID,D1.dID;

Table information:
fctsales (100 million records) -- 4 fields: dist, dist_acct, item, date, and
the primary key is<dist_acct, item, date>;
dimaccts (1 million records) -- 10 fields: dist_acct, aID, .... primary key
is<dist_acct>;
dimdist (500 records) -- 10 fields: dist, dID,.... primary key is<dist>;
dimitem (200 records) -- 10 fields: item, .... primary key is<item>;
dimtime (1500 records, around 2 years dates) -- 10 fields: date, ... primary
key is<date>.

Requirement: get the results from the above query within 10-15 seconds for
various date ranges.

------------------------------------------------
Because the fctsales table is huge, we have to use other resources to trade
with the speed. Here is my thoughs: Whether we can split the fctsales table
to some subsets (say, 5 subsets), run these 5 subset parallel, and combine
the results. For example, for above query (assuming there are 5 possible dID
in Dimdist table), we can split it into:

select A1.aID,D1.dID,
decimal(count(distinct case when(tdate between date('2009-10-01') and
date('2010-01-31') ) THEN f1.dist_acct else null end)) count1,
decimal(count(distinct case when(tdate between date('2008-10-01') and
date('2009-01-31') ) THEN f1.dist_acct else null end)) as count2
from fctsales f1,dimtime t1,dimaccts a1,dimdist D1,dimitem I1
where t1.date = f1.date and a1.dist_acct = f1.dist_acct and D1.DIST =
f1.DIST and I1.ITEM = f1.ITEM
and D1.dID in ('0001')
and ((t1.tdate between date('2009-10-01') and date('2010-01-31')) or
(t1.tdate between date('2008-10-01') and date('2009-01-31')))
group by A1.aID,D1.dID;

select A1.aID,D1.dID,
decimal(count(distinct case when(tdate between date('2009-10-01') and
date('2010-01-31') ) THEN f1.dist_acct else null end)) count1,
decimal(count(distinct case when(tdate between date('2008-10-01') and
date('2009-01-31') ) THEN f1.dist_acct else null end)) as count2
from fctsales f1,dimtime t1,dimaccts a1,dimdist D1,dimitem I1
where t1.date = f1.date and a1.dist_acct = f1.dist_acct and D1.DIST =
f1.DIST and I1.ITEM = f1.ITEM
and D1.dID in ('0002')
and ((t1.tdate between date('2009-10-01') and date('2010-01-31')) or
(t1.tdate between date('2008-10-01') and date('2009-01-31')))
group by A1.aID,D1.dID;

.
.
.

select A1.aID,D1.dID,
decimal(count(distinct case when(tdate between date('2009-10-01') and
date('2010-01-31') ) THEN f1.dist_acct else null end)) count1,
decimal(count(distinct case when(tdate between date('2008-10-01') and
date('2009-01-31') ) THEN f1.dist_acct else null end)) as count2
from fctsales f1,dimtime t1,dimaccts a1,dimdist D1,dimitem I1
where t1.date = f1.date and a1.dist_acct = f1.dist_acct and D1.DIST =
f1.DIST and I1.ITEM = f1.ITEM
and D1.dID in ('0005')
and ((t1.tdate between date('2009-10-01') and date('2010-01-31')) or
(t1.tdate between date('2008-10-01') and date('2009-01-31')))
group by A1.aID,D1.dID;

We can perform "union" to union the results. Is this idea reasonable?

If I want to use Java stored procedure to implement it, I want to use
muliple threads (5 threads in this example) to get the database connection,
so that these 5 queries can run at the same time. Then I will get 5
resultSet. But how to union these resultSet in my Java stored procedure
becomes another problem. It is even more complex if the combined resultSet
has to be joined with resultSet generated from other queries.

Any comments are welcome!

thanks,
yan