Re: MS SQL Server inconsistency with READ UNCOMMITTED *and* READ COMMITTED

On Mon, Jan 30, 2012 at 2:53 PM, CRPence <CRPbottle@xxxxxxxxx> wrote:

On 30-Jan-2012 07:37 , Charles Wilt wrote:

<<SNIP>>

So DB2 for i gurus, how does DB2 do it?

They are called "dirty reads" for a reason :-)

Dirty reads I thought I could handle ...but the idea of double
counting or not counting some set of rows is another story because the
DB has had to add pages to the table is another story.

The ridiculous results shown in the COUNT(*) in the referenced
article would be hard to reproduce with the DB2 for i SQL because the
DB2 for i performs row-locking rather than "page" locking; i.e. nothing
to do with SLS on IBM i nor the clustered indexes of other DBs, just a
result of a more granular locking for database I\O by using record-level
locking rather than the bucket approach of locking an entire "page" of
rows to update as few as just one row. And without a WHERE clause, the
counts are likely to be correct because the dataspace and index each
maintain the count, so visiting each page would not be required.

I though MS SQL Server did support record lock, though I know it often
upgrades them to page locks. But from the description of the problem,
it seems directly related to the fact that the data could move between
pages due to the clustered indexes.

How about with a WHERE clause? And in particular one where the
optimizer is making use of an index but still needs the data pages.
ie. WHERE keyfld = somevalue and otherfield = othervalue.

To avoid problems with dirty reads either allocate the data exclusive
to the reader or the reader and every updater must participate with an
appropriate level of isolation that will prevent dirty reads.

READ COMMITTED is supposed prevent dirty reads, yet at least for MS
SQL Server the double counting or not counting of valid rows is
apparently still a problem.

However I would still avoid using COUNT(*) in a SQL SELECT from "live
data" if the count result must relate directly to some specific row data
of interest; that is, I would instead, FETCH and perform my own count of
the specific rows. If just interested in grouping counts, I would
probably be content in the point-in-time result when querying live data.

That seems to go against one of the points to SQL in the first place,
namely giving the query to the DBMS and letting it figure out how to
give you the correct answer.

A point in time answer would be fine, but the issue is that the answer
returned was never actually correct.

If you can't trust the DBMS to do a simple count, how can you trust it
to do anything else? :)

Using a row-changed timestamp [in an index] and allowing only read of
committed data, illogical results could be eliminated by counting only
rows that have not been updated since the start of the query. A row
that might otherwise be double-counted could not be since a timestamp
for the since-changed would then preclude counting that row again. I
did not read anything about "snapshot isolation" but would presume that
could be a means to implement.?

Well from what I've read, with snapshot isolation in theory the DBMS
basically never updates or deletes a row, only inserts. The result in
Oracle is kinda what you describe with the SCN (System Change Number)
taking the place of a timestamp field. In practice, the prior
versions of the data are rebuilt from the redo/change/transaction logs
as needed.

Still, it seems like all that work ought to be done in the DBMS, MS
SQL Server added it, DB2 LUW added it, I was think maybe DB2 for i
just didn't need it.

Charles

Re: MS SQL Server inconsistency with READ UNCOMMITTED *and* READ COMMITTED - How about DB2 i?

Re: MS SQL Server inconsistency with READ UNCOMMITTED and READ COMMITTED - How about DB2 i?