Re: Variable length fields question -- MIDRANGE-L

On 29-Jul-2014 11:09 -0500, John R. Smith, Jr. wrote:

I have some questions about variable length fields in files created
using DDS.

<http://archive.midrange.com/midrange-l/200911/msg01287.html>
_Another Question about variable length fields_
"... Coding Techniques for SQL Performance by Kent Milligan ..."

When I learned about them, I was told that if there is a field with a
max of 100 bytes with a VARLEN(20),

FWiW, in SQL vs DDS, that would be: VARCHAR(100) ALLOCATE(20)

the system allocates 20 bytes in the record and if the value is
longer than 20 bytes, it stores the additionally needed bytes
elsewhere.

That is an essentially correct description of the function, generally, as implemented. Nuances, for example due to minimums or other factors, may have some alternative implementations. As implementation details, they are subject to change, but what is exposed via an interface to the programmer remains unchanged.

A copy of the referenced document in the above archive link does not have a /picture/ of a similar example to VARCHAR(100) ALLOCATE(20) :-( Only a picture showing the equivalent of the minimal and maximal allocations for the chosen size, so effectively showing equivalents of both: VARCHAR(100) ALLOCATE(0) and VARCHAR(100) ALLOCATE(100)

When the record is read, the OS knows to retrieve the 20 bytes from
the record and concatenates the additional bytes from the elsewhere
storage (if any exist) and it returns a 100 byte field.

Essentially. Though when the value overflows the allocated length, I do not recall if perhaps up to the entire value might be stored in the auxiliary varlen segment.

For the row-level-access (RLA) the database [almost exclusively] returns contiguous fixed-length data for the fields of each row via a Data Management buffer; i.e. the buffer is also fixed-length with the ability to store one or many records. For the field definition from the example scenario, the database would present in the buffer, a full 102-bytes of storage that is prefixed with the unsigned 2-byte integer length attribute. For any value of length-n, any data beyond position n+2 is undefined\unpredictable, although the database will for RLA [with a buffer of fixed-length records], forcibly blank-pad the values in the 102 bytes of storage.

For most languages, what is an effective host variable [generically, not solely as an SQL HV, e.g. a DS from an external Record Format] is similarly 102 bytes of storage allocated for that example. However what the database actually /returns/ to a program is a varying length result of a known value and length; what the recipient language [user] _perceives_ as the representation of that result is immaterial, from the perspective of the database. The value might be represented in a language as two contiguous scalar primitives, an integer *BIN2 and a *CHAR LEN(100) each with maximally allocated storage, but the database is returning only the actual value as both a length and the number of characters of that length; i.e. there is no requirement that the database had reserved\carved-out the storage for the number of bytes beyond the length of the string up to the maximum length of the field, and why any data beyond the defined /length/ of the actual value could be anything.

The net result is less DASD tied up with extraneous spaces in the
file with no impact to the developer.

Indeed, the fixed-length record storage requirements can be reduced, by having eliminated blank-padding for values shorter than the allocated length of both fixed-length and varying length columns. Although varying length column values are not padded, the minimal amount of storage is the allocated length, but data for values exceeding the allocated length can be stored in a more compact manner external to the fixed-portion of the row [per reduction in those "extraneous spaces" that would otherwise be maintained in the fixed-length record storage within the dataspace].

However, when I look at the file via DSPPFM or WRKLNK <sic; DSPF>, I
see all 100 bytes even if all 100 are spaces.

The Display Physical File Member (DSPPFM) and the Display File [Data] (DSPF) commands [note: not the Work With Object Links (WRKLNK), which is a menu feature, not a means to view the data] are both using the RLA and each feature presents row\record data as fixed-length with data from contiguous fields implying storage [start and end] locations that match the output from the Display File Field Descriptions (DSPFFD) command. There is no method provided by the operating system database to present the distinct locations of the sub-components of varying-length data; the actual location of those, is known only to the LIC database, and the column data is only returned by the LIC to the DB2 for i as one entity.

FWiW: When I want to visualize the actual storage of the data, I will dump a copy of the dataspace and vlen aux SID using the STRSST D/A/D; the database file member from which data would be dumped would start as an empty duplicate made from the file with the desired record format [column definitions], since populated with just some few specific data records that have some variations of data that would be both easy-to-see+find and of various length(s) -- possibly one as the empty string, another below the allocated length, one matching the allocated length, and another [probably a couple] overflowing the allocated length.

My questions are:
1) Is the OS smart enough to pad out the 21-100 bytes when displayed
via WRKLNK <sic; DSPF> and DSPPFM so I don't see the difference or is
my understanding of how the DDS VARLEN works flawed?

I am unsure I understand the question. But the DB2 for i will present the datum as a length attribute and that much data following. The RLA /read/ feature will pad short values with blanks, mostly only for consistency, because the RLA is providing what is inherently a fixed-length record interface [up to 32K].

2) Assuming my VARLEN is guessed correctly and most records fit into
the allocated bytes, is there any noticeable impact with using VARLEN
fields?

Given the allocated-length specification on the VARLEN() or ALLOCATE() allows a large proportion of rows having the varying data fit without overflowing the allocated length, then the effect could be noticeably positive due to reduced overall storage in the dataspace and minimal secondary read\write activity required [to access the data in the variable-len segment].

The use of VARLEN, with or without a allocated-length specification, can have a "noticeable impact" on the processing of the data; most notably, how the data is stored and retrieved. How much impact this has on the user\programmer may go effectively unnoticed, or may be noticed as either a positive or negative impact. The programmer+DBA may need to make a decision that tempers the requirements for the LIC DB to read\write data outside of the fixed-portion of the dataspace; a choice that is intended to increase any positive effects and limit any negative effects. The likely trend\tendency would be positive for storage and performance.

3) If my guess is incorrect and a big chunk of records require the
additional 21-100 bytes, does this have a huge performance impact?

Given the allocated-length specification on the VARLEN() or ALLOCATE() allows only a small proportion of rows having the data fit without overflowing, then the effect could be noticeably negative due to the overall amount of data stored outside the dataspace, maximizing secondary read\write activity required [to access so much of the data within the variable-len segment]. Depending on how the file is used, how [often] those rows are accessed, how much impact this has on the user\program may go effectively unnoticed; however the likely trend/tendency would be negative for performance and somewhat likely also for storage.