|
The problem description sounds as if you are exceeding the amount of memory allocated to the storage pool WAS is running in. At that point, paging becomes excessive and garbage collection can fall behind and then just when you want DMPJVM to work, it doesn't. This: [12/4/06 23:47:43:931 CST] 00000011 SystemOut O C400WARNI: 485997/QEJBSVR/SERVER1 GC heap uses 107% of the non-reserved pool. JVM GC Heap Size(kB) Effective PoolSize(kB):408273 381196 is indicating that heap is becoming too large for the pool. And it is indicating that the effective pool size (amount of memory in pool currently) is approx 375M (my math is non existent in this case - it is 381196/1024 M). That is pretty small in general for WAS and the GC heap (4082873K) is not particularly large for WAS - is it possible other workloads are taking a lot of memory? Note: The same support (heap monitor) that sends the message above to SystemOut.log also sends messages to QSYSOPR message queue. These messages are a side effect of what is happening with memory usage: [11/17/06 8:58:11:480 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 8" (00000186) has been active for 730294 milliseconds and may be hung. There is/are 9 thread(s) in total in the server that may be hung. and should not be taken to be a thread issue. If the GC recovered (say you were able to push a bunch of memory into the base pool), you would most likely see corresponding messages saying the threads had "come back" and were no longer possibly hung. One recommendation is to run WAS in its on pool with dedicated memory - how much memory would depend on what your server utilizes at peak times. DMPJVM is a good way to check that (when it can complete). This FAQ is for V5.1 and earlier but tells how to associate the WAS subsystem with a different share pool in order to isolate it. For V6, subsystem is QWAS6 and the *SBSD is QWAS6/.QWAS6 If problem continues, you may want to contact IBM support so that they can do some analysis of the problem. Frances Stewart WebSphere Application Server for iSeries, Technical Team Lead/Architect External web site: http://www.iseries.ibm.com/websphere Team web site: http://w3.rchland.ibm.com/~was E-mail: francess@xxxxxxxxxx IBM Rochester "Todd Bryant" <tbryant@nufounda tion.org> To Sent by: <java400-l@xxxxxxxxxxxx> java400-l-bounces cc @midrange.com Subject WAS non-responsive 12/05/2006 01:27 PM Please respond to Java Programming on and around the iSeries / AS400 <java400-l@midran ge.com> We are having a problem since we have upgraded to WAS 6.0.2.15 on the iSeries. Unfortunately, we have little information at this time, but I thought I would throw out what is happening and see if anyone has any input. On three separate occasions WAS has ground to a halt and stopped responding to page requests. It is not really hung, per se. It is more like it is stuck in a loop or paging out memory or something. When it happens, CPU usage, when viewed using WRKACTJOB, shows minimal usage, not only for WAS but also for the whole system. Other jobs seem to be running fine e.g, interactive and batch jobs. We had one instance where there were two remote data connections that someone had that were using only a little cpu, but we thought those jobs may be doing intensive I/O, so we killed those jobs and WAS became responsive immediately. However, in the other cases another job causing the problem could not be identified absolutely. The last time this happened I tried running the DMPJVM command, but it simply hung and never finished. I have also tried to get into the admin to check the Tivoli Performance Viewer to look at heap size, but the admin app would not come up. WAS is currently set up to use the BASE memory pool. We have 5 or 6 gig of ram in the machine and BASE normally has 2.5 to 3.5 gig allocated to it, depending on interactive. Using the Tivoli Performance Viewer in the WAS admin, WAS usually has a memory allocation of 500-700 meg, and the used memory is 250-300meg. We have gotten some error messages in the logs occasionally that make me think that this may be a memory issue: [12/4/06 23:47:43:931 CST] 00000011 SystemOut O C400WARNI: 485997/QEJBSVR/SERVER1 GC heap uses 107% of the non-reserved pool. JVM GC Heap Size(kB) Effective PoolSize(kB):408273 381196 I am not sure how to read this. It appears it may be saying that the pool size has shrunk to 381meg and it is using 408meg. One hypothesis I have at this point is that other jobs in the BASE memory pool are using a large amount of ram and that ram available to WAS is being cut back because of it and WAS is either running the GC constantly, or worse, it is paging out to disk. One time when we tried to shut down the server when it became non-responsive we had this in the logs: [11/17/06 8:58:01:651 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 32" (00000fcd) has been active for 736230 milliseconds and may be hung. There is/are 2 thread(s) in total in the server that may be hung. [11/17/06 8:58:07:848 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 34" (00000fcf) has been active for 735964 milliseconds and may be hung. There is/are 3 thread(s) in total in the server that may be hung. [11/17/06 8:58:08:712 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 17" (00000fbe) has been active for 737383 milliseconds and may be hung. There is/are 4 thread(s) in total in the server that may be hung. [11/17/06 8:58:09:596 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 20" (00000fc1) has been active for 736983 milliseconds and may be hung. There is/are 5 thread(s) in total in the server that may be hung. [11/17/06 8:58:10:161 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 33" (00000fce) has been active for 736005 milliseconds and may be hung. There is/are 6 thread(s) in total in the server that may be hung. [11/17/06 8:58:10:809 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 31" (00000fcc) has been active for 736379 milliseconds and may be hung. There is/are 7 thread(s) in total in the server that may be hung. [11/17/06 8:58:11:222 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 4" (00000182) has been active for 739389 milliseconds and may be hung. There is/are 8 thread(s) in total in the server that may be hung. [11/17/06 8:58:11:480 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 8" (00000186) has been active for 730294 milliseconds and may be hung. There is/are 9 thread(s) in total in the server that may be hung. [11/17/06 8:58:11:837 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 19" (00000fc0) has been active for 730323 milliseconds and may be hung. There is/are 10 thread(s) in total in the server that may be hung. [11/17/06 8:58:12:131 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 44" (00000fda) has been active for 701857 milliseconds and may be hung. There is/are 11 thread(s) in total in the server that may be hung. [11/17/06 8:58:13:075 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 18" (00000fbf) has been active for 737204 milliseconds and may be hung. There is/are 12 thread(s) in total in the server that may be hung. [11/17/06 8:58:13:326 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 43" (00000fd9) has been active for 704627 milliseconds and may be hung. There is/are 13 thread(s) in total in the server that may be hung. [11/17/06 8:58:13:548 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 13" (00000fba) has been active for 738022 milliseconds and may be hung. There is/are 14 thread(s) in total in the server that may be hung. [11/17/06 8:58:13:886 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 30" (00000fcb) has been active for 736480 milliseconds and may be hung. There is/are 15 thread(s) in total in the server that may be hung. [11/17/06 8:58:14:119 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 29" (00000fca) has been active for 736551 milliseconds and may be hung. There is/are 16 thread(s) in total in the server that may be hung. [11/17/06 8:58:14:338 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 24" (00000fc5) has been active for 736812 milliseconds and may be hung. There is/are 17 thread(s) in total in the server that may be hung. [11/17/06 8:58:14:548 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 28" (00000fc9) has been active for 736611 milliseconds and may be hung. There is/are 18 thread(s) in total in the server that may be hung. [11/17/06 8:58:14:821 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 12" (00000fb9) has been active for 738008 milliseconds and may be hung. There is/are 19 thread(s) in total in the server that may be hung. [11/17/06 8:58:15:009 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 23" (00000fc4) has been active for 730255 milliseconds and may be hung. There is/are 20 thread(s) in total in the server that may be hung. [11/17/06 8:58:15:340 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 42" (00000fd8) has been active for 713749 milliseconds and may be hung. There is/are 21 thread(s) in total in the server that may be hung. [11/17/06 8:58:15:515 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 14" (00000fbb) has been active for 737527 milliseconds and may be hung. There is/are 22 thread(s) in total in the server that may be hung. [11/17/06 8:58:15:705 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 35" (00000fd0) has been active for 735872 milliseconds and may be hung. There is/are 23 thread(s) in total in the server that may be hung. [11/17/06 8:58:15:986 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 26" (00000fc7) has been active for 736671 milliseconds and may be hung. There is/are 24 thread(s) in total in the server that may be hung. [11/17/06 8:58:16:150 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 47" (00000fdd) has been active for 687165 milliseconds and may be hung. There is/are 25 thread(s) in total in the server that may be hung. [11/17/06 8:58:16:310 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 37" (00000fd2) has been active for 735783 milliseconds and may be hung. There is/are 26 thread(s) in total in the server that may be hung. [11/17/06 8:58:16:497 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 10" (00000fb7) has been active for 738634 milliseconds and may be hung. There is/are 27 thread(s) in total in the server that may be hung. [11/17/06 8:58:16:765 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 25" (00000fc6) has been active for 736671 milliseconds and may be hung. There is/are 28 thread(s) in total in the server that may be hung. [11/17/06 8:58:16:969 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 36" (00000fd1) has been active for 735868 milliseconds and may be hung. There is/are 29 thread(s) in total in the server that may be hung. [11/17/06 8:58:17:138 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 39" (00000fd4) has been active for 735728 milliseconds and may be hung. There is/are 30 thread(s) in total in the server that may be hung. [11/17/06 8:58:17:309 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 27" (00000fc8) has been active for 736611 milliseconds and may be hung. There is/are 31 thread(s) in total in the server that may be hung. [11/17/06 8:58:17:506 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 16" (00000fbd) has been active for 737454 milliseconds and may be hung. There is/are 32 thread(s) in total in the server that may be hung. [11/17/06 8:58:17:720 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 46" (00000fdc) has been active for 689796 milliseconds and may be hung. There is/are 33 thread(s) in total in the server that may be hung. [11/17/06 8:58:18:110 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 11" (00000fb8) has been active for 738302 milliseconds and may be hung. There is/are 34 thread(s) in total in the server that may be hung. [11/17/06 8:58:18:294 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 22" (00000fc3) has been active for 736911 milliseconds and may be hung. There is/are 35 thread(s) in total in the server that may be hung. [11/17/06 8:58:18:604 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 6" (00000184) has been active for 730294 milliseconds and may be hung. There is/are 36 thread(s) in total in the server that may be hung. [11/17/06 8:58:18:830 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 21" (00000fc2) has been active for 736890 milliseconds and may be hung. There is/are 37 thread(s) in total in the server that may be hung. [11/17/06 8:58:19:136 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 41" (00000fd6) has been active for 730189 milliseconds and may be hung. There is/are 38 thread(s) in total in the server that may be hung. [11/17/06 8:58:19:326 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 48" (00000fde) has been active for 675698 milliseconds and may be hung. There is/are 39 thread(s) in total in the server that may be hung. [11/17/06 8:58:19:597 CST] 000001a8 ThreadMonitor W WSVR0605W: Thread "WebContainer : 40" (00000fd5) has been active for 735688 milliseconds and may be hung. There is/are 40 thread(s) in total in the server that may be hung. Which made me wonder if it was a thread issue, but I could also see this happening if the GC was dominating or the jvm was paging to disk. What I am wondering is if anyone else has had problems like this or if anyone can give me any ideas on what to check or change. Thanks for any help you can give us. Todd Bryant Programmer/Analyst University of Nebraska Foundation phone#: 402.458.1131 NOTE: This electronic message and attachment(s), if any, contains information which is intended solely for the designated recipient(s). Unauthorized disclosure, copying, distribution, or other use of the contents of this message or attachment(s), in whole or in part, is prohibited without the express authorization of the sender of this message. -- This is the Java Programming on and around the iSeries / AS400 (JAVA400-L) mailing list To post a message email: JAVA400-L@xxxxxxxxxxxx To subscribe, unsubscribe, or change list options, visit: http://lists.midrange.com/mailman/listinfo/java400-l or email: JAVA400-L-request@xxxxxxxxxxxx Before posting, please take a moment to review the archives at http://archive.midrange.com/java400-l.
As an Amazon Associate we earn from qualifying purchases.
This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].
Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.