RE: Question using jdbc/jt400 -- JAVA400-L

From: NGay@xxxxxxxxxxxxx

Joe,

Reading this discussion, I was pondering how I'd try to solve it, but the
way you put this last question pointed out something much more
fundamental.
Lets just assume for the minute you can write a single SQL statement and
get a single result set that provides all of the above, and allows the
user
to page up/down as they wish. Obviously you've talking about keeping hold
of this result set between each call to the servlet, which means storing
the result set in the users' session, unless I've misunderstood?

Nigel, this to me is the 800 pound gorilla in the room when you're talking
about SQL vs. ISAM. When you're using ISAM of any form (be it dBase or
DB2), you are really reading the index file. In the early days, there were
actually separate files for the data and the index, and you would open the
index and walk along it. (I actually spent a summer writing an indexed
access conversion method for something called an Ontel word processor. It
stored its data (and indices) in ASCII in an on-board disk that was always
filling up, and we wrote an interface that allowed an EBCDIC machine -- a
Series/1 -- to act as the central database for multiple word processors.
This was way before DOS, Windows and Word.)

Anyway, as long as you keep your pointer into the index, it's child's play
to run back and forth through the file. And searching the index is just a
simple b-tree lookup, so for even a billion record file you could find a
record in a maximum of about 30 accesses (plus one to read the actual data
record). That's how ISAM works, and that's why it's been so beneficial to
the business community. Add the single store of the System i, and you've
got virtually instant access to often-used files.

So let's look at the SQL way of doing things. In the worst case, you have
to recreate the view every time. In effect, the SQL engine builds a brand
new index. And this index only includes the records you select, which means
you have the records starting at some key, and no records prior to that.
One issue can be addressed to some degree with permanent views, which is in
effect a logical file. I've heard arguments that SQL removes the need for
logical views, but the reality is that unless this is a file of trivial
size, you will need the views.

Now you have the mechanics of reading the data (let's not even talk about
updating at this point). There are only two ways to do this: one is a
result set, the other is a row set. The result set has to keep some sort of
connection to the database as long as it is alive. How the SQL engine does
this is beyond my meager knowledge of SQL and especially ODBC, but somehow a
persistent pointer is kept. This pointer is good throughout the life of the
result set, with the caveat of course being that you can't read before the
beginning or past the end of the specified results, and you can't position
yourself by key within the list. On wikipedia, you can read the following
caveat for JDBC:

"Note that connections, statements, and resultsets often tie up operating
system resources such as sockets or file descriptors. In the case of
connections to remote database servers, further resources are tied up on the
server, eg. cursors for currently open resultsets."

The rowset, on the other hand, is a disconnected set of data, but it
requires that you read all the records into the row set; with a million
record file, that's simply not feasible. So really, you're only option is
the resultset.

Can you get away with doing things like that? I would have thought
keeping
the result set open would keep the usage count up on the
PreparedStatement,
and that would keep active connections to the DB used up from the
connection pool. How would those connections be freed - we don't know
when
the user leaves that particular page - so its only going to be when the
session expires, which could be 30 mins to a couple of hours later. If
you've got thousands of users going to different pages on your site, and
all those pages have page down/up features, wouldn't you use up all your
DB connections from the pool very quickly?

I'd be interested to know more about the internals. If I have 1000 users
paging through the customer master file, does that mean I have 1000
connections to the database to keep open? Is that 1000 sockets? Is it 1000
indexes, or are they sharing the same index? Now they jump to the item
master inquiry and start paging through the item master file. I can have a
different resultset on the same connection, so hopefully it's still only
1000 connections. But it is 2000 resultsets. But is it 2000 indexes? Or
just 2000 cursors into the same two indexes? My guess is that it depends on
the SQL statement, and whether you have shared VIEWs that can be used. If
the filtering is different (my company vs. your company, for example, or my
user id vs. your user id for row-level security) then it seems to me the
engine can't share access paths as easily and will probably create temporary
access paths.

With a message-based architecture, you just have one job per user. There's
just one connection to a controller that routes requests and returns data.
That controller can have all the files open, and has to do his own
selection, but that's what the machine is made for, and that's why I like
message-based systems and always have. Even if my message based system uses
a cursor into some embedded SQL under the covers, at least I have control
over my resource usage.

Just curious since it seems an interesting way to try to solve the general
page down/up requirement. I wouldn't have even tried to use less than 3
result sets!

Well, whenever I throw issues like this up to the "all SQL all the time"
crowd, I never get a response, or if I do it's some cop out like, "Any
implementation of your trivial request will simply be criticized based on
your rules, which change and grow with every post." If it's so trivial,
then why not just write it? Well, because it can't be done <grin>.

I'm not bashing SQL, either. I like SQL a lot for lots of things, but I'd
really like to know how this all works, and how it stacks up in real
performance terms. I wonder if there's anybody out there who really
understands how this SQL connection business actually works under the
covers?

Bu as far as performance testing goes, I think I may have to do that myself,
which means it's time to fire up the old performance tests again.

Joe