Re: Perfromance issues -- JAVA400-L

On 15-May-2014 00:13 -0500, Imad Moukaddam wrote:

We are encountering a very strange behavior in iSeries response time
and we are running out of possible causes to this phenomenon:
We developed a JAVA application that launches COBOL services on the
iSeries, via a Socket. Everything seemed to run smoothly.
Then we've tried to test the overall performance of the application,
therefore, we have used a test tool to simulate hundreds of
simultaneous connections, and this is where the strange behavior is
being noticed: at first everything seems to be alright, excellent
response time from the service on the iSeries machine, the CPU was
running at around 85-90%, then suddenly at a certain point, the
response time from the service was more than 10 seconds, and after
that point, the response time were very awkward, the same service
gave very quick responses for some connection, and very slow
responses for other connections. More strangely, we've noticed that
when this behavior is occurring, the CPU utilization decreased to
less that 50 or 40%... (meaning, while we are encountering
performance issues, the iSeries CPU is resting!)

Unlikely "resting". More likely "paging"; most likely approaching a level of paging typically referred to as "thrashing". While the system is paging memory for the jobs to get work done, some jobs are in a wait state and thus unable to utilize the CPU. Although that means other non-waiting can utilize the CPU, the system must still dispatch work across the total number of jobs according to priority and time slice restrictions. According to the CPU utilization before the slowdown, the bottleneck on throughput likely was mostly the CPU, but after the slowdown the bottleneck is almost surely memory. When the memory requirements of the active jobs exceeds the available memory [in a memory storage pool] effectively the in-use memory must be swapped between permanent [disk] and temporal [memory] storage so each job's memory requirements can be met. Thus more memory can further push-out the point at which the memory paging requirements start causing interruptions [i.e. a bottleneck] in the ability of a job to most efficiently take advantage of the CPU.

It is like the requests were not getting through to the service!
Is there any explanation to this?
Hardware fault? (memory, disk...)

Probably not a "fault" as in /failure/.

But coincidentally, "faulting" is a term that describes obtaining memory in a non-predictive manner; i.e. obtained expensively [impacting overall performance]. The term "paging memory" [without reference to a "fault"] refers to memory being obtained smoothly, obtained due to work asking explicitly what to load into memory. The term "page fault" refers to memory being obtained due to work that requires something to be in memory, but for which that work did not explicitly ask in-advance to load that memory [or had asked, but with an asynchronous request, but the actual reference to the memory occurred before the system had processed the explicit request to load that memory from disk].

Too much QZSDAOINIT open?

Many database connections will inherently add to memory requirements [for the job alone], and can potentially [very likely] lead to increased memory to implement the individual [usually query] requests performed in those jobs. Yet if the jobs perform work using the same objects and data, the memory requirements might not be a huge impact. As well, common memory can be /fixed/ to prevent the system from paging-out that memory; only having to page\fault that data back into memory in a few moments.

We have tried to identify any possible lock on the database, but
nothing appeared to be faulty at this level neither.

Locks are just one of many different types of possible waits. As already alluded, the waits in the described scenario, are likely for disk I/O waits for memory.

Any help would be appreciated

The topic is probably not specific to the use of JAVA, thus probably eligible for the midrange-l. However, general performance issues for a system are very specific to that system, due to the environment; system and application configuration, and how the system and applications are being utilized. The best bet is to utilize the performance tools and experts [with that tooling] to help direct where to look to improve the overall throughput. The never-ending issue with such tasks to improve performance, is chasing the next bottleneck after the issue causing the current bottleneck is deemed resolved [or as good as it will get].

IBM sells tooling and [consulting] services to assist in that regard. Other companies and consultants do as well.

FWiW: Given there are apparent database requests, the Database tooling [e.g. index adviser] might be one place to start looking. For example, an index might enable limiting the number of database pages required to complete a query request, thus reducing the amount of memory for the QZDASOINIT jobs that run that particular SQL query for which the INDEX was advised.