On 12-Aug-2014 17:45 -0500, Vinay Gavankar wrote:
<<SNIP>>
This is the full scenario:
File A is unique by Id, and has a trigger program for *INSERT and
UPDATE (records are not being deleted from the file). The trigger
program writes the ID and Before and After Images to a data queue.
File A is being updated by multiple batch as well as interactive
programs, and it is "possible" that same Id gets updated in rapid
succession (batch programs process hundreds of records per second).
The program I am working on reads the data queue and I want to
capture an audit trail by writing the Id, Seq# and the Before Image
of the record. There are multiple instances of this program running,
reading from the same data queue.
<<SNIP>>
It is also important that the records are written in the sequence
they were updated, that is why I wanted to avoid the trapping for
error on Write and looping, in the first place.
Adding this logic in the trigger program will of course solve the
problem, but that is out of question, as it will slow down all of the
batch programs.
This new information both about how the data for writing to the file
was derived [data being read from a data queue concurrently], and how
the data must be written to that file with a chronological order,
complicates the scenario. Prior suggestions [of mine, and as I recall
those of others also] are effectively nullified, without either some
additional work to effect this newly identified serialization
requirement or instead changing the design\implementation for the same ends.
As I now understand the scenario, the file to which the data is being
written, serves as an effective journal receiver for logging the
/sequential/ I\O operations performed against a particular file with a
trigger; the file with the trigger will be called the audited-file, and
the file having the effective audit-records written will be called the
log-file. The /sequence/ of the update activity performed against any
particular Id value in the audited-file, data as rows written to the
log-file, must reflect the order of the [triggered] I/O operations
performed against the [particular Id value of the] audited-file; each
successive update [to a particular Id value] should have [however
seemingly illogical] a decremented sequence number.
Serializing the writes merely to prevent duplicates and effecting a
decremented sequence is quite simply effected using techniques already
described in prior responses. Serializing the writes to *also* reflect
a chronology is not so simple. Although obtained serially and stored as
such, the data in the described scenario can be read concurrently, thus
lacks any predictable order when that data is then written to the
log-file *unless* each writer is serialized [reflecting the same order
each message\entry was removed from the data queue].
The problem of course is the asynchronous nature of the concurrent
reads from the data queue, giving rise to an unpredictable order of
insertions into the log-file. Intuitively, the invocation doing the
insert operation can not be responsible to determine the next sequence
number.
A conspicuous means to resolve would seem to involve a timestamp
[along with what is a default and presumed-valid assumption that time
will only ever move forward], either taken when the I/O occurred or when
the data was removed from the queue; at either point in the process, the
data is known to be properly ordered. Each invocation performing the
insertion of the audit-row would then be required not only to add the
new sequence number, but also to ensure that the sequence reflected the
information from the timestamp. However that is not easily
accomplished, because just as one process does not know what order their
own entry [that they dequeued] was on the queue relative to any other
[that another process dequeued], each process is also unaware of the
timestamp obtained by any other [irrespective of the timestamp being set
when the entry was enqueued or when the entry was dequeued].
To mollify the requirement for the relevance of the timestamp at the
time of logging [writing to the log-file], the writer could simply store
the timestamp instead of a sequence number; the sequencing would be
deferred to read-time, for example using an OLAP query to establish a
ROW_NUMBER() with an ORDER over the timestamp column.
I am not familiar with using SQL, but are you all saying that if
this file were to be defined in DDL, then system will take care of
inserting the records in correct sequence, even if more than one
additional request was made while the first one was still being
completed? If so, will that command also return the Sequence number
just inserted, so that I can use it for some other purpose?
The SQL can define a file that will automatically populate a sequence
value, the Identity column, and the value of the just-inserted identity
value is available to the program [that just inserted the row] in a
couple different ways. However that is moot, as the described scenario
can not benefit directly from only that feature, as discussed above.
As an Amazon Associate we earn from qualifying purchases.