RE: members in a file -- MIDRANGE-L

Hey Joe!

Happy New Year!

I found a couple things using google - one an article by Michael Cain about the time V5R2 came out -

On average, a column stat takes up approximately 8K - 12K per column.

Then there is the redbook written by, among others, our co-posters Birgitta and Elvis, available at
http://www.redbooks.ibm.com/redbooks/pdfs/sg246654.pdf

I was in the performance lab testing this stuff in 2001 - had to go to that redbook to be reminded what is in the stats
- cardinality - a single value
- frequent values
- selectivity - uses a histogram with, i think, 256 boxes - or 512 - anyhow, it has a fixed top limit, as I recall.

All this is available with greatest accuracy from indexes, but if those are absent, a kind of table scan will be run to gather these - histograms were based on a sampling, but the algorithm was one similar to that used in DB2 on other platforms. And there is a max length for the value to probe this hash table that supports the histogram - i might have this mixed up - senility sets in! And the probe value is limited - say, 256 characters, to limit overall size of things.

The point is, it is a fairly finite and consistent size for this information for a given column. One could get a sense of it, perhaps, by creating a table, populating it with various kinds of distribution and skew for different columns, see what its size is. Have it so the stats are not collected against it. Then run stats against every column and see the difference in size of the object. Create EVI indexes over each column - they provide all these stats best of all - or regular indexes.

Regards
Vern

At 10:36 AM 12/29/2007, you wrote:

> From: BirgittaHauser
>
> >>What happens if you stop and start collection of SQL Statistics?
>
> Statistics are used by the SQE optimizer to determine the optimal index.
> The statistics contain information about how much rows in a table meet the
> available key values in an index.
> For example if you want to select all order from a specified client and
> around 80% of all orders are from this client, the optimizer will favor a
> table scan instead of using an index. If an other client will be selected
> an index can be used.

> (...snip...)

> BTW the CQE optimizer can only use estimates. For example for an =
> specified
> in a where condition 10% of the rows in a table are expected/estimated.

And it is this collection of statistics that makes SQL so powerful. Without
it, a good programmer with knowledge of his database can still use ISAM
techniques to provide better performance than SQL because he will in effect
be able to make the decisions that the CQE cannot.

As statistics get better, SQL matches and then surpasses the low-level
programmer because its information is always correct while the programmer is
relying on his/her knowledge of the system, which may be out of date.
Similarly, though, if you stop the collection of statistics, the SQL
statistics will become stale and it will begin to make poorer decisions.

And of course, statistics don't help as much in completely ad hoc
situations, because the SQL engine will have no statistics on queries that
have never been done. But in general, SQL queries will just get better and
better.

One thing I do worry about is exactly how much space is taken up to store
the information required to make the queries faster. I'd love to see the
statistics on what percentage of disk space is required to store the
statistics for a file. If it's something on the order of a tenth of a
percent, it's not a big deal. If it's 10% or more, it's a big deal!

Joe

--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.