Recommendations, anyone?
We know we need to purchase a new network switch for $20,000 because our
switch gets overloaded with traffic every Monday morning at 0738A.M. and
occasionally at random intervals during the regular work day.  Really
aggravating. 
Symptom #1: +++++++++++++++++++++++++++++++
Our web server is at I.P.  address 10.0.0.39 loses connection to IBM i
at about 13:32
+++++++++++++++++++++++++++++++++++++++++++++++++++++=
Symptom #2:  
During the period that connection to web server at 10.0.0.39 is
interrupted, the 15 or so QZDASOINIT jobs on the IBM i go crazy,
spawning as many as 130+ QZDASOINIT jobs.
We learn about this only because users call telling us that response
time has degraded for jobs on the IBM i and on the web server.
(We are still on V5R3M5 for another 4 weeks, and then will upgrade to
POWER 6 and V5R4M5.)
Step 1.  I issue the command  DSPACTPJ SBS(QUSRWRK) PGM(QZDASOINIT)  to
get a birds-eye view of things.
Step 2. ENDHOSTSVR SERVER(*DATABASE)                            
Step 3. ENDPJ SBS(QUSRWRK) PGM(QZDASOINIT) OPTION(*IMMED)  
Step 4. Wait 30 seconds for QZDASOINIT jobs to end.  I monitor with:
wrkactjob sbs(qusrwrk)      
Step 5.  STRHOSTSVR SERVER(*DATABASE)   to get things going again.
+++++++++++++++++++++
Here is some detail from today's occurrence.
     08/25/09  13:32:26.083920  QMHGSD       QSYS        0748     QCMD
QSYS    
  Message . . . . :  -ENDHOSTSVR SERVER(*DATABASE)
     08/25/09  13:32:35.097808  QMHGSD       QSYS        0748     QCMD
QSYS    
  Message . . . . :  -ENDPJ      SBS(QUSRWRK) PGM(QZDASOINIT)
OPTION(*IMMED)          
00   08/25/09  13:32:35.249968  QWTCCEPJ     QSYS        01A6     QCMD
QSYS    
  Message . . . . :   End of prestart jobs in progress.  
Message ID . . . . . . :   CPF0920
 Date sent  . . . . . . :   08/25/09      Time sent  . . . . . . :
13:32:35  
 
 Message . . . . :   All prestart jobs are ending for program QZDASOINIT
in    
   QSYS.
 
 Cause . . . . . :   All prestart jobs for program QZDASOINIT in library
QSYS  
   in subsystem QUSRWRK are ending for reason 1.  See reason 1 shown
below:    
     1 - The End Prestart Job (ENDPJ) command was entered.
     2 - An error occurred when new jobs were being started.
     For reason 2, display the job log (DSPJOBLOG command) for the
subsystem to
   determine the cause of the error.
     When all the prestart jobs have ended, message CPC0905 will appear
in the 
   subsystem job log and the system operator message queue.
 Recovery  . . . :   To start jobs, display the system operator
(QSYSOPR)      
   message queue (DSPMSG command). After message CPC0905 appears in the
message
                             
              Job Log                             SAMSON   08/25/09
14:01:49          
QPADEV003H      User  . . . . . . :   ELEHTI       Number . . . . . . .
. . . . :   62
ELEHTI          Library . . . . . :   QGPL
SEV  DATE      TIME             FROM PGM     LIBRARY     INST     TO PGM
LIBRARY 
  Cause . . . . . :   The prestart jobs for program QZDASOINIT in
library QSYS        
    in subsystem QUSRWRK are in the process of being ended.
     08/25/09  13:33:01.879488  QMHGSD       QSYS        0748     QCMD
QSYS    
  Message . . . . :  -wrkactjob sbs(qusrwrk)
       08/25/09  13:33:28.966936  QMHGSD       QSYS        0748     QCMD
QSYS   
    Message . . . . :  -STRHOSTSVR SERVER(*DATABASE)
  00   08/25/09  13:33:29.024400  QWTCCSPJ     QSYS        01E6
QC2SYS      QSYS   
    To module . . . . . . . . . :   QC2SYS
    To procedure  . . . . . . . :   system
    Statement . . . . . . . . . :   6
    Message . . . . :   Start of prestart jobs in progress.
    Cause . . . . . :   The prestart jobs for program QZDASOINIT in
library QSYS       
      in subsystem QUSRWRK are being started.
  30   08/25/09  13:33:29.365800  QWDMMSG      QSYS        0117
QZSOSVCT    QSYS   
    To module . . . . . . . . . :   QZSOSVCT
    To procedure  . . . . . . . :   QzsoAddRoutingTableEnts
    Statement . . . . . . . . . :   17
    Message . . . . :   Routing entry sequence number 600 already
exists.              
    Cause . . . . . :   One of the following errors occurred: -- The
routing entry     
      with sequence number 600 already exists. -- The sequence number
(SEQNBR          
      parameter) was not specified correctly. Recovery  . . . :   Omit
the             
      command, or change the sequence number (SEQNBR parameter) and then
try the       
      command again.  To change the routing entry, use the CHGRTGE
command.            
 
Job 619183/QUSER/QZDASRVSD ended on 08/25/09 at 13:32:29; 1 seconds
used; en
All prestart jobs are ending for program QZDASOINIT in QSYS.
Job 621491/QUSER/QZDASOINIT ended on 08/25/09 at 13:32:35; 1 seconds
used; e
Job 621469/QUSER/QZDASOINIT ended on 08/25/09 at 13:32:35; 1 seconds
used; e
Job 621461/QUSER/QZDASOINIT ended on 08/25/09 at 13:32:35; 1 seconds
used; e
Job 621450/QUSER/QZDASOINIT ended on 08/25/09 at 13:32:35; 1 seconds
used; e
Job 621475/QUSER/QZDASOINIT ended on 08/25/09 at 13:32:35; 1 seconds
used; e
Job 621439/QUSER/QZDASOINIT ended on 08/25/09 at 13:32:36; 1 seconds
used; e
Job 621456/QUSER/QZDASOINIT ended on 08/25/09 at 13:32:36; 1 seconds
used; e
And this message finally appears 10 minutes after the initial problem
began:
Message ID . . . . . . :   TCP2617
 Date sent  . . . . . . :   08/25/09      Time sent  . . . . . . :
13:42:35   
 
 Message . . . . :   TCP/IP connection to remote system 10.0.0.39
closed,       
   reason code 1.
 
 Cause . . . . . :   The TCP/IP connection to remote system 10.0.0.39
has been  
   closed. The connection was closed for reason code 1.  Full connection
   details for the closed connection include:
     - local IP address is 10.0.0.5
     - local port is 8471
     - remote IP address is 10.0.0.39
     - remote port is 1601
 Reason codes and their meanings follow:
     1 = TCP connection closed due to expiration of 10 minute FINWAIT2
timer.   
     2 = TCP connection closed due to R2 retry threshold being run.
 
More...
From job . . . . . . . . . . . :   QTCPIP            
  User . . . . . . . . . . . . :     QTCP            
  Number . . . . . . . . . . . :     613770          
                                                     
From program . . . . . . . . . :   QTOCTCPI          
  Instruction  . . . . . . . . :     0764            
                                                     
To message queue . . . . . . . :   QSYSOPR           
  Library  . . . . . . . . . . :     QSYS            
                                                     
Time sent  . . . . . . . . . . :   13:42:35.562688   
  
As an Amazon Associate we earn from qualifying purchases.