RE: SQL database makovers (was: Query optimizer tells me to build an... -- MIDRANGE-L

This is starting to get off topic but I wanted to put in my $0.02.

A relational database is a tool for storing data that follows the relational
principles set up by Codd. The 12 principles (from
http://news.dcn-asu.ru/BOOKS/Special.Edition.Using.Oracle8/ch01/0014-0016.ht
ml - note that the text is from an Oracle perspective) are:

1. The information rule. Information is to be represented as data stored in
cells. 

2. The guaranteed access rule. Each data item must be accessible by
combination of table name + primary key of the row + column name. For
example, if you could access a column by using arrays or pointers, then this
would violate this rule.
 
3. Nulls must be used in a consistent manner. If a Null is treated as a 0
for missing numeric values and as a blank for missing character values, then
this violates this rule. Nulls should simply be missing data and have no
values. If values are desired for missing data, vendors usually offer the
ability to use defaults for this purpose. 

4. An active, online data dictionary should be stored as relational tables
and accessible through the regular data access language. If any part of the
data dictionary were stored in operating system files, this would violate
this rule. 

5. The data access language must provide all means of access and be the only
means of access, except possibly for low-level access routines (see rule
12). 

6. All views that may be updatable should be updatable. If, for example, you
could join three tables as the basis for a view, but not be able to update
that view, then this rule would be violated. 

7. There must be set-level inserts, updates, and deletes.

8. Physical data independence. An application cannot depend on physical
restructuring. If a file supporting a table was moved from one disk to
another, or renamed, then this should have no impact on the application. 

9. Logical data independence. An application should not depend on logical
restructuring. If a single table must be split into two, then a view would
have to be provided joining the two back together so that there would be no
impact on the application. 

10. Integrity independence. Integrity rules should be stored in the data
dictionary. Primary key constraints, foreign key constraints, check
constraints, triggers, and so forth should all be stored in the data
dictionary. 

11. Distribution independence. A database should continue to work properly
even if distributed. This is an extension of rule 8, except rather than only
being distributed on a single system (locally), a database may also be
distributed across a network of systems (remotely). 

12. The nonsubversion rule. If low-level access is allowed, it must not
bypass security nor integrity rules, which would otherwise be obeyed by the
regular data access language. 

An additional rule (rule 0) was added later
0. For a system to qualify as an RDBMS, that system must use its relational
facilities exclusively to manage the database



SQL is the language usually used for accessing data stored according to the
principles; however, it does not have to be. 

No RDBMS products I am aware of strictly adhere to all of the 12 rules. 

I think the "flat file" (which are not really flat files) system native to
the 400 most likely would be considered relational according to the rules. 

Indexes are not a requirement of a relational database; they are simply a
way to improve performance of a relational database by preloading the
information necessary to perform sort operations

-----Original Message-----
From: midrange-l-bounces@xxxxxxxxxxxx
[mailto:midrange-l-bounces@xxxxxxxxxxxx] On Behalf Of Vern Hamberg
Sent: Tuesday, March 30, 2004 6:09 PM
To: Midrange Systems Technical Discussion
Subject: Re: SQL database makovers (was: Query optimizer tells me to build
an...

There was a discussion some time ago, to the effect that the database on the
400 (and the 38 before that) is not a true RDBMS. However, I seem to recall
that it was always presented that way. If I understand the Date book a
little, it seems that the underlying physical structure of a database does
not define whether it is relational, rather a higher level abstraction does
that. BTW, physical here is not used in the same way as it is in the term,
"physical file".

AFAIK, physical files are not the same as flat files. They ARE the same as
tables created through the SQL CREATE TABLE statement.

And keyed logical files and SQL indexes are essentially the same thing -
B-trees.

SQL views are superficially called logical files but have no real access
path - they are merely a container for an SQL SELECT statement that is
executed to generate something like an ODP (open data path).

Physical files and indexes all use a "primitive" construct called a data
space. You CAN trace your way through a file to its associated data space in
SST - I forget the details right now, haven't done so for a while.

The object-based nature of the 38/400/iSeries allows each object to use the
same base objects (such as data space) with different attributes for each
subtype.

Notice that I did not include the 36 in my list. As I understand it,
physical files were typically fixed-length but unstructured records, whose
fields were defined in a HLL.

To some degree, I'm inclined to define an RDBMS by the methods of accessing
it - SQL is a relational language (with some deviations from pure Codd-RIP,
yes), so anything that has sufficient SQL capability would be relational. 
That's probably not what strict practitioners would say, so please enlighten
me.

So I'm asking for clarification of the statement that DB2/400 is a flat file
system and not an RDBMS. What do the terms mean? At what architectural level
are you applying these terms? And how do these distinctions matter?

Thanks
Vern

At 10:45 AM 3/30/2004 -0700, you wrote:
>Gang,
>
>Was there any more discussion on this subject, especially by someone 
>that truly understands the differences of how the iSeries handles flat 
>files built using DDS vs. how the iSeries and its DB2/400 engine handles
>the tables and indexes built using DDL(SQL)?   I'd like to see a
>technical document from IBM Rochester and/or IBM Santa Teresa Lab (DB2 
>build site ??) and/or IBM Toranto (DB2 build site) that addresses all 
>the ins and outs and how the iSeries handles this subject.
>
>I suspect if the DB2/400 engine handles things anything like the 
>mainframe (DB2/MVS, DB2/VM and DB2/VSE), you'll find the OS/400 and 
>DB2/400 handle access to data more efficiently than the traditional 
>flat file DDS data access via the old key construct as all this relates 
>to queries.
>
>However, since the OS/400 is a different breed of cat relative to the
>mainframe operating systems, I don't really know.   Which is why I'd
>like IBM to speak about how things really work.
>
>I can tell you one thing, the faster iSeries folks move to a true 
>relational data base design vs. the traditional flat file model, the 
>better off they will be especially as they get pushed more toward 
>decision support systems for management that usually includes data
>marts, data warehouses, data mining, OLAP, and more.   I would also
>suspect, as the iSeries and the pSeries come closer in their use of the 
>same processors and hardware architecture and as something like Linux 
>gets more popular because of economic issues, you'll see most 
>application products built to a RDBMS of some sort, probably DB2 and 
>Oracle.  Therefore, the quicker shops move their application 
>development toward the true relational model, the better.
>
>FWIW,
>
>Dave Odom
>Arizona
>
>date: Mon, 29 Mar 2004 15:25:17 -0500
>from: rick.baird@xxxxxxxxxxxxxxx
>subject: SQL database makovers (was: Query optimizer tells me to build
>         an....)
>
>This talk of the difference between DDS and SQL created access paths 
>has caused me to question long and deeply held beliefs when it came to 
>database design on the iseries.
>
>When designing databases, i'd always tried to follow these few simple 
>guidelines (others may disagree with them, but let's not quibble over 
>that part - it's always worked pretty well for me).
>
>1.  most physical files have a key, a unique one if appropriate.
>2.  logical files were built for any access paths needed for inquiries, 
>or batch pgms that would be used daily or more (generally).
>3.  everything else would be handled via opnqry.
>3.  opnqryf would be used for record selection rather than creating a 
>bunch of single purpose logicals.
>4.  if performance problems are a concern or are encountered, logicals 
>are created as indicated to help out.
>
>How does the apparent superiority of SQL generated access paths change 
>these rules?
>
>Would you make a point of replacing existing logicals with SQL indices 
>where possible?
>
>Would you build SQL indices and views next to any logicals (or physical
>keys) that can't be directly translated to SQL?
>
>If not (or not always), why not and/or when?
>
>and one more question...  Can you specify on an SQL view or index the 
>Access path maintenance  (MAINT keyword) or Access path recovery 
>(RECOVER keyword)?  If so, how?
>
>Thanks,
>
>Rick
>
>--------original message---------
>midrange-l-request@xxxxxxxxxxxx wrote:
>
> >   1. Re: Query optimizer tells me to build an access path that
> >      already   exists (Clare Holtham)
> >
> >That's one of the indexes you ALWAYS want to build.
>
>This subject does show a glimpse of the differences between PFs/LFs and 
>tables/views/indexes. I've seen that if an LF is created over a PF and 
>both share the same key, then the LF file description shows the PF as 
>owning the access path. But a SQL CREATE INDEX using the same field 
>doesn't appear to share the existing PF access path; it seems that a 
>brand new one is created. (There might be ways of getting LFs and SQL 
>indexes to coincide better, but I haven't learned them.)
>
>Two obvious considerations come to mind -- double access-path 
>maintenance and double space for storing the access paths.
>
>Tom Liotta
>
>_______________________________________________
>This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing 
>list To post a message email: MIDRANGE-L@xxxxxxxxxxxx To subscribe, 
>unsubscribe, or change list options,
>visit: http://lists.midrange.com/mailman/listinfo/midrange-l
>or email: MIDRANGE-L-request@xxxxxxxxxxxx Before posting, please take a 
>moment to review the archives at 
>http://archive.midrange.com/midrange-l.


_______________________________________________
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L@xxxxxxxxxxxx To subscribe, unsubscribe,
or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxx Before posting, please take a
moment to review the archives at http://archive.midrange.com/midrange-l.