× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



John,

My understanding is that CPU queueing is a wait period that occurs after a
task has been dispatched to the CPU. If dispatched then what is it waiting
for?

I'll try to be more clear about my observations about the article
referenced in your earlier message concerning Java multi-core utilization
and parallel processing.

As you know, the author performs progressive testing of a pool of 50
threads, each performing a CPU bound workload (string concatenation). He
points out that in the first test the threads run serially and use only 1
core as opposed to running parallel and using the 8 cores on the server.

The author asserts that the 1 second sleep time simulates remote
processing, but in these series of tests the sleep period evidently allows
50 threads to be instantiated concurrently, prior to having to run their
CPU bound workload.

The author records that the elapsed run-time for each thread is apx. 1.277
seconds in the first test, which includes the 1 second sleep. The total
elapsed time for the 50 threads was 65.298 seconds. 1.277 times 50 =
63.850, so the total elapsed time is consistent with a serial workload.

That begs the question. If you instantiate 50 threads in Java, why would
they run serially by default? Why wouldn't the OS dispatch them to multiple
cores?

As a relevant aside, I am aware of folks who have tried to configure Java
app servers to run hundreds of threads concurrently with the intent of
dispatching Servlet instances across multiple cores - but never got it to
work. Multi-core servers remained woefully underutilized when stressed no
matter the number of threads allocated. Cross reference those findings with
published benchmarks which show very strong correlation between the number
of app server instances and the number of cores on the benchmark platform.

That sets the stage for the author to explain additional support in
java.util.concurrent
which allows the thread pool to be dispatched to multiple cores, which
leads to the results displayed in Images 7 and 8 which shows task having an
elapsed time of approximately 11 seconds each, and 8 cores used 100% each.

That begs the question, what programmer would implement an interface that
would cause a thread which normally executes in .277 seconds, to run in an
environment that extends the elapsed time to 11 seconds?

Image 11 of the Profiler shows the threads alternating between "run" states
and "wait" states over a long elapsed period before the threads complete.
That's what I meant by "throttling".

If you remove the "sleep" and just let the 50 threads run serially, they
would only use 1 core and complete in approximately 15 seconds. Contrast
that with burning 8 cores over an elapsed time of 11 seconds. Who would do
that?

Nathan.


On Tue, Oct 21, 2014 at 10:41 PM, John Yeung <gallium.arsenide@xxxxxxxxx>
wrote:

On Tue, Oct 21, 2014 at 6:21 PM, Nathan Andelin <nandelin@xxxxxxxxx>
wrote:
Thanks for the reference, John. And I agree that Java can run pools of
parallel tasks via the "Callable" interface and "consume" CPU on multiple
cores. But it appears that even your reference illustrates the futility
of
that interface.

In the example cited:

A "Task" appends a character to a string in a loop 20K times order to
consume CPU. When running a pool of 50 tasks sequentially each instance
completes in an elapsed time of 1.27 seconds which includes 1 second of
"sleep" time. When run in parallel, each instance of the pool completes
in
approximately 11 seconds.

Why would a programmer consciously "throttle" tasks which ordinarily
require essentially .27 seconds of CPU time and make them take longer
(effectively 40+ times longer) to complete, just to prove a point about
Java's ability to allocate work to multiple cores?

I'm not clear what you are referring to as "throttling". The 1-second
delay is a deliberate artifice to simulate a remote call. Basically,
in the real world, few applications are anywhere near 100% pure CPU.
They have to wait for I/O, they have to wait for other resources to
become available, etc. He was just trying to make the example more
realistic.

I don't know what the "effectively 40+ times longer" is supposed to
mean. Where do you get that from? 11 seconds is about 40+ times
longer than .27 seconds, but I believe he's saying that running all 50
tasks in parallel took a total of 11 seconds. Running them serially
took about 65 seconds. So using 8 cores produced a speed-up of a
little less than 6 times over using a single core.

In the example cited, it took a pool of 50 Callable (submitted) Tasks to
drive 8-cores to 100% utilization. Why couldn't Java drive 8 cores to
100%
with a pool of just 8 Callable Tasks?

It absolutely could. He spent some of the article talking about how
the exact nature of the tasks affects how you'll want to configure
your pools. At the extreme of a completely CPU-bound, perfectly
8-way-parallelizable application, 8 threads for 8 cores would indeed
be the way to go.

Should application programmers take responsibility for allocating work to
multi-core servers? Isn't that the responsibility of the OS?

The traditional way has been for the OS to do all the allocation.
It's still the most efficient if all you have on the system are
single-threaded batch jobs. Giving the application programmer the
ability to do work allocation is for flexibility and finer control
over resources. In principle, the application programmer knows what
parts of the application can run in parallel, what parts need to wait
for other parts, etc.; things which would be difficult or impossible
for the OS to know without the programmer telling it.

Regarding Ronald Luijten's comment about Java not supporting multi-cores
at
all, no that didn't have anything to do with IBM i. It was just an
observation about Java.

Then his observation was just plain wrong.

I understand that the total elapsed time to complete 50 Task instances is
greater when run sequentially, than in parallel (submitted). But how
might
that apply to the question at hand, in Tim's original post?

I don't know. I really latched onto what I found were
misunderstandings of the capabilities of Java, and wanted to correct
them.

I don't know exactly what "CPU queuing" means, in the OP's context.

All benchmarks of Java web workloads indicate that you must run multiple
application server instances to fully utilize multiple cores. The ratio
is
pretty much one to one, even though the application server may be
configured with say 100 active threads.

I don't doubt this, but I also don't see the relevance to CPU queuing.
(Again, maybe I would if I knew what it was.)

John Y.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.



As an Amazon Associate we earn from qualifying purchases.

This thread ...

Follow-Ups:
Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.