On 15-May-2014 00:13 -0500, Imad Moukaddam wrote:
We are encountering a very strange behavior in iSeries response time
and we are running out of possible causes to this phenomenon:
We developed a JAVA application that launches COBOL services on the
iSeries, via a Socket. Everything seemed to run smoothly.
Then we've tried to test the overall performance of the application,
therefore, we have used a test tool to simulate hundreds of
simultaneous connections, and this is where the strange behavior is
being noticed: at first everything seems to be alright, excellent
response time from the service on the iSeries machine, the CPU was
running at around 85-90%, then suddenly at a certain point, the
response time from the service was more than 10 seconds, and after
that point, the response time were very awkward, the same service
gave very quick responses for some connection, and very slow
responses for other connections. More strangely, we've noticed that
when this behavior is occurring, the CPU utilization decreased to
less that 50 or 40%... (meaning, while we are encountering
performance issues, the iSeries CPU is resting!)
Unlikely "resting". More likely "paging"; most likely approaching a
level of paging typically referred to as "thrashing". While the system
is paging memory for the jobs to get work done, some jobs are in a wait
state and thus unable to utilize the CPU. Although that means other
non-waiting can utilize the CPU, the system must still dispatch work
across the total number of jobs according to priority and time slice
restrictions. According to the CPU utilization before the slowdown, the
bottleneck on throughput likely was mostly the CPU, but after the
slowdown the bottleneck is almost surely memory. When the memory
requirements of the active jobs exceeds the available memory [in a
memory storage pool] effectively the in-use memory must be swapped
between permanent [disk] and temporal [memory] storage so each job's
memory requirements can be met. Thus more memory can further push-out
the point at which the memory paging requirements start causing
interruptions [i.e. a bottleneck] in the ability of a job to most
efficiently take advantage of the CPU.
It is like the requests were not getting through to the service!
Is there any explanation to this?
Hardware fault? (memory, disk...)
Probably not a "fault" as in /failure/.
But coincidentally, "faulting" is a term that describes obtaining
memory in a non-predictive manner; i.e. obtained expensively [impacting
overall performance]. The term "paging memory" [without reference to a
"fault"] refers to memory being obtained smoothly, obtained due to work
asking explicitly what to load into memory. The term "page fault"
refers to memory being obtained due to work that requires something to
be in memory, but for which that work did not explicitly ask in-advance
to load that memory [or had asked, but with an asynchronous request, but
the actual reference to the memory occurred before the system had
processed the explicit request to load that memory from disk].
Too much QZSDAOINIT open?
Many database connections will inherently add to memory requirements
[for the job alone], and can potentially [very likely] lead to increased
memory to implement the individual [usually query] requests performed in
those jobs. Yet if the jobs perform work using the same objects and
data, the memory requirements might not be a huge impact. As well,
common memory can be /fixed/ to prevent the system from paging-out that
memory; only having to page\fault that data back into memory in a few
moments.
We have tried to identify any possible lock on the database, but
nothing appeared to be faulty at this level neither.
Locks are just one of many different types of possible waits. As
already alluded, the waits in the described scenario, are likely for
disk I/O waits for memory.
Any help would be appreciated
The topic is probably not specific to the use of JAVA, thus probably
eligible for the midrange-l. However, general performance issues for a
system are very specific to that system, due to the environment; system
and application configuration, and how the system and applications are
being utilized. The best bet is to utilize the performance tools and
experts [with that tooling] to help direct where to look to improve the
overall throughput. The never-ending issue with such tasks to improve
performance, is chasing the next bottleneck after the issue causing the
current bottleneck is deemed resolved [or as good as it will get].
IBM sells tooling and [consulting] services to assist in that regard.
Other companies and consultants do as well.
FWiW: Given there are apparent database requests, the Database
tooling [e.g. index adviser] might be one place to start looking. For
example, an index might enable limiting the number of database pages
required to complete a query request, thus reducing the amount of memory
for the QZDASOINIT jobs that run that particular SQL query for which the
INDEX was advised.
As an Amazon Associate we earn from qualifying purchases.