On 29-Jul-2014 11:09 -0500, John R. Smith, Jr. wrote:
I have some questions about variable length fields in files created
using DDS.
<
http://archive.midrange.com/midrange-l/200911/msg01287.html>
_Another Question about variable length fields_
"... Coding Techniques for SQL Performance by Kent Milligan ..."
When I learned about them, I was told that if there is a field with a
max of 100 bytes with a VARLEN(20),
FWiW, in SQL vs DDS, that would be: VARCHAR(100) ALLOCATE(20)
the system allocates 20 bytes in the record and if the value is
longer than 20 bytes, it stores the additionally needed bytes
elsewhere.
That is an essentially correct description of the function,
generally, as implemented. Nuances, for example due to minimums or
other factors, may have some alternative implementations. As
implementation details, they are subject to change, but what is exposed
via an interface to the programmer remains unchanged.
A copy of the referenced document in the above archive link does not
have a /picture/ of a similar example to VARCHAR(100) ALLOCATE(20) :-(
Only a picture showing the equivalent of the minimal and maximal
allocations for the chosen size, so effectively showing equivalents of
both: VARCHAR(100) ALLOCATE(0) and VARCHAR(100) ALLOCATE(100)
When the record is read, the OS knows to retrieve the 20 bytes from
the record and concatenates the additional bytes from the elsewhere
storage (if any exist) and it returns a 100 byte field.
Essentially. Though when the value overflows the allocated length, I
do not recall if perhaps up to the entire value might be stored in the
auxiliary varlen segment.
For the row-level-access (RLA) the database [almost exclusively]
returns contiguous fixed-length data for the fields of each row via a
Data Management buffer; i.e. the buffer is also fixed-length with the
ability to store one or many records. For the field definition from the
example scenario, the database would present in the buffer, a full
102-bytes of storage that is prefixed with the unsigned 2-byte integer
length attribute. For any value of length-n, any data beyond position
n+2 is undefined\unpredictable, although the database will for RLA [with
a buffer of fixed-length records], forcibly blank-pad the values in the
102 bytes of storage.
For most languages, what is an effective host variable [generically,
not solely as an SQL HV, e.g. a DS from an external Record Format] is
similarly 102 bytes of storage allocated for that example. However what
the database actually /returns/ to a program is a varying length result
of a known value and length; what the recipient language [user]
_perceives_ as the representation of that result is immaterial, from the
perspective of the database. The value might be represented in a
language as two contiguous scalar primitives, an integer *BIN2 and a
*CHAR LEN(100) each with maximally allocated storage, but the database
is returning only the actual value as both a length and the number of
characters of that length; i.e. there is no requirement that the
database had reserved\carved-out the storage for the number of bytes
beyond the length of the string up to the maximum length of the field,
and why any data beyond the defined /length/ of the actual value could
be anything.
The net result is less DASD tied up with extraneous spaces in the
file with no impact to the developer.
Indeed, the fixed-length record storage requirements can be reduced,
by having eliminated blank-padding for values shorter than the allocated
length of both fixed-length and varying length columns. Although
varying length column values are not padded, the minimal amount of
storage is the allocated length, but data for values exceeding the
allocated length can be stored in a more compact manner external to the
fixed-portion of the row [per reduction in those "extraneous spaces"
that would otherwise be maintained in the fixed-length record storage
within the dataspace].
However, when I look at the file via DSPPFM or WRKLNK <sic; DSPF>, I
see all 100 bytes even if all 100 are spaces.
The Display Physical File Member (DSPPFM) and the Display File [Data]
(DSPF) commands [note: not the Work With Object Links (WRKLNK), which is
a menu feature, not a means to view the data] are both using the RLA and
each feature presents row\record data as fixed-length with data from
contiguous fields implying storage [start and end] locations that match
the output from the Display File Field Descriptions (DSPFFD) command.
There is no method provided by the operating system database to present
the distinct locations of the sub-components of varying-length data; the
actual location of those, is known only to the LIC database, and the
column data is only returned by the LIC to the DB2 for i as one entity.
FWiW: When I want to visualize the actual storage of the data, I will
dump a copy of the dataspace and vlen aux SID using the STRSST D/A/D;
the database file member from which data would be dumped would start as
an empty duplicate made from the file with the desired record format
[column definitions], since populated with just some few specific data
records that have some variations of data that would be both
easy-to-see+find and of various length(s) -- possibly one as the empty
string, another below the allocated length, one matching the allocated
length, and another [probably a couple] overflowing the allocated length.
My questions are:
1) Is the OS smart enough to pad out the 21-100 bytes when displayed
via WRKLNK <sic; DSPF> and DSPPFM so I don't see the difference or is
my understanding of how the DDS VARLEN works flawed?
I am unsure I understand the question. But the DB2 for i will
present the datum as a length attribute and that much data following.
The RLA /read/ feature will pad short values with blanks, mostly only
for consistency, because the RLA is providing what is inherently a
fixed-length record interface [up to 32K].
2) Assuming my VARLEN is guessed correctly and most records fit into
the allocated bytes, is there any noticeable impact with using VARLEN
fields?
Given the allocated-length specification on the VARLEN() or
ALLOCATE() allows a large proportion of rows having the varying data fit
without overflowing the allocated length, then the effect could be
noticeably positive due to reduced overall storage in the dataspace and
minimal secondary read\write activity required [to access the data in
the variable-len segment].
The use of VARLEN, with or without a allocated-length specification,
can have a "noticeable impact" on the processing of the data; most
notably, how the data is stored and retrieved. How much impact this has
on the user\programmer may go effectively unnoticed, or may be noticed
as either a positive or negative impact. The programmer+DBA may need to
make a decision that tempers the requirements for the LIC DB to
read\write data outside of the fixed-portion of the dataspace; a choice
that is intended to increase any positive effects and limit any negative
effects. The likely trend\tendency would be positive for storage and
performance.
3) If my guess is incorrect and a big chunk of records require the
additional 21-100 bytes, does this have a huge performance impact?
Given the allocated-length specification on the VARLEN() or
ALLOCATE() allows only a small proportion of rows having the data fit
without overflowing, then the effect could be noticeably negative due to
the overall amount of data stored outside the dataspace, maximizing
secondary read\write activity required [to access so much of the data
within the variable-len segment]. Depending on how the file is used,
how [often] those rows are accessed, how much impact this has on the
user\program may go effectively unnoticed; however the likely
trend/tendency would be negative for performance and somewhat likely
also for storage.
As an Amazon Associate we earn from qualifying purchases.