× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



On 12-Aug-2014 17:45 -0500, Vinay Gavankar wrote:
<<SNIP>>

This is the full scenario:

File A is unique by Id, and has a trigger program for *INSERT and
UPDATE (records are not being deleted from the file). The trigger
program writes the ID and Before and After Images to a data queue.

File A is being updated by multiple batch as well as interactive
programs, and it is "possible" that same Id gets updated in rapid
succession (batch programs process hundreds of records per second).

The program I am working on reads the data queue and I want to
capture an audit trail by writing the Id, Seq# and the Before Image
of the record. There are multiple instances of this program running,
reading from the same data queue.

<<SNIP>>

It is also important that the records are written in the sequence
they were updated, that is why I wanted to avoid the trapping for
error on Write and looping, in the first place.

Adding this logic in the trigger program will of course solve the
problem, but that is out of question, as it will slow down all of the
batch programs.

This new information both about how the data for writing to the file was derived [data being read from a data queue concurrently], and how the data must be written to that file with a chronological order, complicates the scenario. Prior suggestions [of mine, and as I recall those of others also] are effectively nullified, without either some additional work to effect this newly identified serialization requirement or instead changing the design\implementation for the same ends.

As I now understand the scenario, the file to which the data is being written, serves as an effective journal receiver for logging the /sequential/ I\O operations performed against a particular file with a trigger; the file with the trigger will be called the audited-file, and the file having the effective audit-records written will be called the log-file. The /sequence/ of the update activity performed against any particular Id value in the audited-file, data as rows written to the log-file, must reflect the order of the [triggered] I/O operations performed against the [particular Id value of the] audited-file; each successive update [to a particular Id value] should have [however seemingly illogical] a decremented sequence number.

Serializing the writes merely to prevent duplicates and effecting a decremented sequence is quite simply effected using techniques already described in prior responses. Serializing the writes to *also* reflect a chronology is not so simple. Although obtained serially and stored as such, the data in the described scenario can be read concurrently, thus lacks any predictable order when that data is then written to the log-file *unless* each writer is serialized [reflecting the same order each message\entry was removed from the data queue].

The problem of course is the asynchronous nature of the concurrent reads from the data queue, giving rise to an unpredictable order of insertions into the log-file. Intuitively, the invocation doing the insert operation can not be responsible to determine the next sequence number.

A conspicuous means to resolve would seem to involve a timestamp [along with what is a default and presumed-valid assumption that time will only ever move forward], either taken when the I/O occurred or when the data was removed from the queue; at either point in the process, the data is known to be properly ordered. Each invocation performing the insertion of the audit-row would then be required not only to add the new sequence number, but also to ensure that the sequence reflected the information from the timestamp. However that is not easily accomplished, because just as one process does not know what order their own entry [that they dequeued] was on the queue relative to any other [that another process dequeued], each process is also unaware of the timestamp obtained by any other [irrespective of the timestamp being set when the entry was enqueued or when the entry was dequeued].

To mollify the requirement for the relevance of the timestamp at the time of logging [writing to the log-file], the writer could simply store the timestamp instead of a sequence number; the sequencing would be deferred to read-time, for example using an OLAP query to establish a ROW_NUMBER() with an ORDER over the timestamp column.

I am not familiar with using SQL, but are you all saying that if
this file were to be defined in DDL, then system will take care of
inserting the records in correct sequence, even if more than one
additional request was made while the first one was still being
completed? If so, will that command also return the Sequence number
just inserted, so that I can use it for some other purpose?

The SQL can define a file that will automatically populate a sequence value, the Identity column, and the value of the just-inserted identity value is available to the program [that just inserted the row] in a couple different ways. However that is moot, as the described scenario can not benefit directly from only that feature, as discussed above.


As an Amazon Associate we earn from qualifying purchases.

This thread ...

Follow-Ups:
Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.