Re: DB2 for i "Backus?Naur Form" specification (Sue Romano) -- MIDRANGE-L

Hi Sue,

Here is our use case as to how and why we would use parsed SQL statement
component references.

Use case is for large database migrations. A client database migration
consists of about 1500 SET based SQL scripts. About 1000 of those scripts
run in a sequenced order, and the rest run not sequenced (no dependencies).

Some run in a particular order, due to things like:
* Load parent tables before child tables, or load downstream tables after
their upstream dependency tables have been loaded.

The SQL scripts are stored in the database itself, and dynamically run from
there. Each script row contains script group and run sequence columns that
together control the run order.

When a developer adds a script to the database, or modifies one, we'd like
to parse the SQL statement for its list of referenced base tables, and warn
the developer, and/or create a DB to-do list task, if the script is
referencing a table that logically has not yet been loaded yet at that
point in the migration, based on the database driven run sequencing. For
example, warn the developer if he/she inserted a new script to load a child
table, but based on the script database run sequencing, the parent table
has not yet loaded (i.e. scripts are configured to run in the wrong
sequence). The developer now knows that run sequencing needs to be
adjusted, or a parent table script is missing.

Using the parsed list of table references from the SQL, we could also warn
the developer if a reference has been made to a table that does not exist
(i.e. misspelled table).

If we had access to every base column an SQL statement referenced, when
scripts are added or modified, we could add code dependency checks to catch
references to columns that don't exist (i.e. a developer misspelled a
column name, or referenced a non-existent one).

The same applies to views and functions (i.e. catch misspelled, or
non-existent view and function names).

If we had all the tables, columns, views, and functions an SQL statement
referenced, we'd enhance our script database to store all that parser
detail into new table(s), and populate those new reference table(s) inside
triggers over the script table. Storing all that extra detail would then
provide the basis for fast dependency checking.

As it stands now, we have to manually run SQL queries over the SQL script
database to find dependency mistakes. That process could be automated if
we captured the SQL parser output and fed off it.

By adding those code dependency checks, the set of scripts should contain
less mistakes or omissions when the developer starts unit or system
testing.

Captured parser data would also allow the creation of an audit script to
run right before a system test, that would find all remaining dependency
issues across the entire script database, that the developers never dealt
with. We'd rather fix them before submitting a long running job, instead
of leaving defects intact only to predictable halt the job at 2am :( In
theory, we could also use SQL code to automatically fix some of the script
sequencing defects (i.e. scripts configured to run in the wrong order).

In short, we'd speed up development time and shorten test cycles if the SQL
script database contained a full table, column, view, and function cross
reference of itself, inside the same DB2 database, in SQL query-able form
(as extracted and stored from the IBM SQL parser). Just like there's
X-Analysis for RPG, we'd like to use the DB2 database and SQL to perform
that same cross reference functionality for SQL.

Are there enough people in the industry (inside or outside of IBM), that
would similarly benefit, now or in the near future, to justify IBM's effort
to create, test, document, and maintain public use SQL interface(s) to the
SQL parser data. I don't know. But, there may some IBM software products
that could leverage that data, resulting in enhanced products.

Since I've not dived into GET DESCRIPTOR (yet), perhaps it already has what
I need. Need to make time for that.

We've seen a large improvement in efficiency by moving SQL code out of
source files and into the database, and access to the SQL parser data would
allow us to capture a lot of metadata for code, instead of just using
metadata for (primarily) data. IT can benefit from data just as much as
business units benefit from data. I think we have a tendency of
overlooking that since we're so busy automating the business units. Code
is IT's most important data, and data about our code is very useful.

Happy Holidays!

Mike