Re: Detached datasets and DRDA compliance -- MIDRANGE-L

> date: Fri, 10 Jun 2005 09:23:48 -0500
> from: "Joe Pluta" <joepluta@xxxxxxxxxxxxxxxxx>
> subject: Detached datasets and DRDA compliance
> 
> This is an interesting point, Steve.
> 
> > From: Steve Richter
> > 
> > I dont think it is an openness issue.  DRDA, I am guessing, is part of
> > a server centric architecture.  Since the release of  ADO.NET ( 5
> > years ago ) and probably SQL Server 2000  Microsoft has gone with the
> > detachable dataset approach to database serving.  They say such a
> > database setup is more scalable and is more compatible with server
> > farms.
> 
> Not sure yet about the server farms issue... I need to think about that.
> But from a standpoint of pure server load, you're saying that the
> overhead of storing the result set on the server and sending it over to
> the caller on a request by request basis is higher than that of creating
> the entire dataset and shipping it as a detached set.

If you are connected, then there is the constant connection to maintain,
with continual connectivity during the entire life-cycle of the
application.  In a disconnected state, the data is requested once,
returned, and the connection is terminated, a la webserving.  Less
concurrency should mean more availability.

> On face value, you seem to be correct on that.  Of course, that's really
> only useful when you have bounded sets; that is, sets of data where the
> requester already knows the beginning and end.  That's a bit different
> than the concept of a scrollable cursor, or even an updateable cursor.
> Detached commitment control isn't going to work very well.

Actually, a DataSet contains a lot of discovery information: you can
easily get column names, ordinals, and types which would at least allow
for some dynamic discovery.  As such, the requester does not necessarily
need to know the bounds.  DataSets are scrollable and writeable.  A
DataSet can even contain multiple tables and enforce relationships
between them.  The idea is that you update the DataSet and eventually
use it to automagically update the database.  Persoanlly, I'm too hands
on for that, but the potential is intriguing.

> But for that segment of the application suite that is read only on small
> but complex datasets, I can understand the detached set concept.
> However, since you still need both scrollable and updatable cursors for
> any sort of real development, this argument against server-side cursors
> (and DRDA) seems to be less feasible.

Actually, according to MS, this is backwards.  For quick read-only
forward-only access they recommend the DataReader which employs a
connected model.  You read through your loop and close the connection. 
I'm not making any arguments, just sharing the way I understand it.

> >  For example, the server side SQL Cursor is still supported by MS, but
> > its use is discouraged and is not a part of the .NET framework.  You
> > have to roll your own.  The theory being that the server farm can
> > serve out a subset of the database as a dataset to the client very
> > efficiently.  The client then orders and joins the datasets as it
> > wishes, using the client's resources.  The server side cursor, with
> > all its built in functionality, puts a heavy load on the server.  So
> > it does not scale well.
> 
> This is even more interesting.  You're saying that the client is more
> equipped to join and order the data.  I'm not sure of all the database
> architecture involved here; but it seems like you're dividing the
> workload of the SQL engine from selecting data on the one hand and
> sorting it on the other.

It makes sense to me to let the Server to simply pass off the data in
the most efficient fashion.  At that point, the user application can
take over the order and presentation.  If the server is responding more
quickly and the user has more control over the presentation, then it
would appear to have an over all benefit.  I'm not sure how this would
apply to joins.

> And that doesn't make sense at first glance, especially on a join,
> because your selected data on joined tables would be limited by the
> results of the initial select.  Of course, I've never believed in the
> concept of distributed databases anyway, except in extreme
> circumstances.  Just think of it... if the three tables of a join are on
> three different machines, then the result of the first join must be sent
> to the second machine, which in turn must send both datasets to the
> third machine, which then sends all three datasets to the client, which
> only THEN begins ordering them.

I wouldn't want this either.

> Obviously, modifiers like HAVING and "FIRST N ROWS" wouldn't be able to
> reduce traffic and processing if the ordering were all client side.

Do you mean because these occur after the Query results, like ORDER BY? 
If that's the case, then why not take that workload off the server
anyway and let the end-application handle it?  FIRST N ROWS speeds up
connected programs, granted, which is great for subfile programs and
displays that limit the number of rows from a larger set.  HAVING
doesn't speed anything up as far as I know, it just analyzes the grouped
statement and returns rows with that behavior.  In both situations the
whole process still has to occur, so total server processing is not
affected... unless I misunderstand it...

One of the things I do with iHOC (our .NET Query tool soon to be
released) is push the idea of working locally with the data.  The
disconnected model allows the user to easily and quickly sort data,
apply filters, run statistics, export subsets, etc., all without
returning to the database.  The results are extremely fast and remove
the burden from the server once the initial dataset has been retrieved.

> To me it seems like what is happening here is that as you add servers to
> a database, the processing requirements exponentiate and to offset that,
> Microsoft will need to use the client CPU cycles as part of the database
> access mechanism,.  That's about as non-scalable as I can imagine, and a
> neat way to lock clients into the servers.  Wow... you can't be a client
> on our server unless you also do a bunch of the work!

Depends on the servers.  iHOC will run against virtually any server, so
I don't see a lot of lock in just because of ADO.NET or the database. 
Yes, the client CPU is being put to more use, but so what?  With PCs
becoming more and more powerful, the user really doesn't notice, and the
server can serve the requests it receives better.  In fact, letting the
client do the work means much faster results than a return trip to the
database, even over Gigabit!  That means you get more out of the Desktop
horsepower and require less from the Server... seems pretty sound to me.

> Hee hee!  That's like saying you can eat at our restaurant, but you have
> to wash the dishes, too.
> 
> Joe

I'll have a piece of the pie... on a paper plate please! <vbg>

Joel Cochran
http://www.rpgnext.com