Re: the need for speed -- RPG400-L

James,

Yes as a computer programmer I would hope that solving the problem at hand
would implicit. I knew I should have been more specific :-)

As a consultant/contract programmer I am often tasked with making the
programs that I write easy for the least capable person on their staff to
understand. Sometimes I think trained monkeys could do better coding than
the least competent programmer but I get paid to satisfy my clients, so
their wish/whimsy is my command. Believe me it certainly chafes me to have
to do this or at least causes me to aggravate the carpal tunnel problems as
I end up writing a page or two of documentation explaining how a given
subroutine/procedure works and what business problem it solves and what key
assumptions have been made in writing it. I honestly try to raise the bar
wherever I go, and I usually spend a great deal of time teaching (which is
not bad at all). When I program at home for my own pleasure or on pet
projects I push the envelope and constantly expand my knowledge.

A source level profiler besides showing the number of times a given
statement is executed would also show the amount of time spent in each
statement/procedure/subroutine. I am sure that I could write a program to
instrument the code and write all the relevant information into a user
space, and I may have to, but I would not want to waste my time writing
something that is already available or soon to be released.

James as compiler technology improves and the back end hardware changes
programming techniques that had made the code run faster can be worse on the
next generation of compiler/processor. The line is blurring as to how and
what optimizations are performed by the compiler and what optimizations are
handled by the processor. For example a common optimization technique is
loop unrolling where one takes a loop and replaces it with X number of
iterations of the same code within their source. This can save a ton of time
by reducing the number of jump instructions which are generally not the most
inexpensive opcodes and make them fall in line. However sometimes this
optimization can hurt if  the cache has been made to overflow and the
computer has to flush the cache bring in another segment of code and data
execute it flush again ad nausium. The same thing could be applied to inline
code (i.e.. copy source) this of course is the absolute fastest way for a
function to be implemented as the compiler does not have to save it's
registers, push the parameters on the stack and make the jump then pop all
the parameters off of the stack, dereference pointers etc... as has to be
done when a person calls a procedure. But now with branch prediction and
very superscaler processors and who knows what other techniques will be
implemented in processors/platforms (ie. NUMA, VLIW, massively superscaler
processors, internal and external instruction reordering, more intelligent
branch prediction). In addition perhaps a certain important function that
was avoided due to performance problems will become implemented in hardware.
If it makes it into hardware you can guarantee that it will be faster than
software (the reason processors have FPU), Perhaps in a future generation
IBM will implement the Date manipulations via hardware and then POOF there
goes your assumption that date routines are slow. A person's optimizations
can only be made for the point of time that they do them. Future knowledge
is reserved for our Creator. A profiler could and is one of the most useful
tools to show changes in architecture and or compiler technology that is
being employed. Yes doing the clock on the wall for course grained tests are
fine but what I want is fine grained performance information, this is best
provided by the compiler vendor or the hardware manufacturer (or someone who
has intimate knowledge of one or both parts of the equation). I have written
a great number of systems that were on embedded hardware with ultra low
memory and even slower hardware that had to respond in real time. Sometime I
would have to squeeze every last millisecond out of a section of code and
there is no way that I could have done it without a profiler. When I would
change backend hardware say from Motorola to Intel or Hitachi or whoever
many key assumptions that applied to one platform were totally different on
the other (i.e.. FPU performance on the PowerPC is faster than integer
performance, that shocked a whole bunch of people).

About the English lesson, thanks! But I have never claimed to be anything
other than reasonably conversant in the language.

I still know my users can not tell the difference between 2ms and 500ms in
an interactive application. As a rule interactive applications are subject
to much less optimizations as the computer may be sitting idle in the 3
second period that rests between the press of the enter key from one screen
to the next. I am not advocating programming like an idiot and doing
everything a person can do to eat as many clock cycles as possible, I am
saying that a person does not (and should not) apply the same level of
optimization to interactive programs as they should to batch.

I am sorry that I was not specific enough in my last letter and I hope that
this one goes a long way to clearing the waters.

Eric

+---
| This is the RPG/400 Mailing List!
| To submit a new message, send your mail to RPG400-L@midrange.com.
| To subscribe to this list send email to RPG400-L-SUB@midrange.com.
| To unsubscribe from this list send email to RPG400-L-UNSUB@midrange.com.
| Questions should be directed to the list owner/operator: david@midrange.com
+---