On 09 Sep 2013 11:31, Jeff Young wrote:
I have a file that contains more that 1M records that I need to copy
to an identical file that will be empty to start.
So /filling/ vs /replacing/ is the topic; i.e. there are additional
considerations for replacing rows, so the difference is germane.
Will it be faster using SQL INSERT or CPYF Fromrcd(1) ?
Most likely CPYF. Some nuances of the actual data, files, system,
system activity, and requirements, each could make the choice of CPYF
less obvious as the faster choice.
If the fast-copy is enabled for the particular invocation of the Copy
File utility, then the CPYF surely will be fastest. The "fast copy"
feature of CPYF is effectively what CRTDUPOBJ does [effectively just
copies the entire dataspace], and requires both that the FROMMBR can be
exclusively allocated and that "identical" is an accurate claim for the
Record Format including the RcdFmt Level Identifier [the FMTOPT(*NONE)
is required either way, IIRC]. The use of CRTFILE(*YES) helps ensure
the /identical/ requirements and thus that part of eligibility for the
fast-copy is met.
If the FMTOPT(*NOCHK) is acceptable [NULL values precludes that
choice], again CPYF is likely to be faster, because there is no data
mapping. If *MAP and\or *DROP are required Format Options, then the
choice between SQL and CPYF becomes less obvious... except some specific
optimization based on the TOFILE() attributes that are built into the
CPYF but must be effected by the user when using SQL INSERT; i.e. CPYF
may implicitly disable and re-enable keyed access paths according to
some metrics, choosing what to do based on a presumption of what actions
make the overall request complete faster.
Both requests can benefit from maximizing SEQONLY() assuming the
system is not already I\O constrained when copying the data. And...
If the 1M rows can fit in memory [esp. if protected from being
paged-out], then the request to Set Object Access (SETOBJACC) of the
data would benefit either request; i.e. prevent page faulting for
database read, albeit sequential read is optimized to limit faults
anyhow, and maximizing number of rows in the sequential buffer further
optimizes sequential read.
For other than fast-copy, the best results for a large number of rows
often will have the logical access paths over the target member disabled
for by-row maintenance [when allowed; i.e. non-unique] and then
re-enabled to rebuild or catch-up in parallel after the copy completes.
Irrespective of SQL or CPYF, when an existing target member is
pre-allocated for the known number of rows to be copied [optionally with
some additional allocation for presumed additional growth]. The SIZE()
and ALLOCATE(*YES) of the Change Physical File (CHGPF) command effect
The use of the combination of FROMRCD() and TORCD() with CPYF can
enable a somewhat simplistic parallel implementation; across multiple
jobs [or... presumably across multiple threads]. Doing the same with
the RRN() of the SQL is probably not a good option, and selection via
portions of a key is probably expensive per random I\O, but selection
implemented via an EVI that identifies big sequentially aligned chunks
of data potentially could avoid the random effect of reading via the
keyed access path [probably best not relied upon].
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
or email: MIDRANGE-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
This mailing list archive is Copyright 1997-2013 by MIDRANGE dot COM and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available here. If you have questions about this, please contact