×
The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.
On 22-Jul-2010 11:16, Charles Wilt wrote:
I'm not following this part.
Why would '% MAIN%' be better than '%MAIN%'?
One scenario will be able to cut-short its processing due to a
known length, whereas the other will have to compare the remaining
characters. Fewer matching prefix bytes means even less comparisons
across the following bytes, so there is less potential for
improvements. The cost of the comparisons are very small until the
complexity of the scan increases the amount of work. Even so, a
mostly insignificant portion of the overall amount of work.
In actuality, I tried to be smart....when I built my varying
search key field, I did the following:
replace(translate(addr1
,' '
,';:.,!@#$%¢&*()-_+={}|¬¦\"<>?/~`')
,' ','' ) as cmpAdr
So that the original data '123 MAIN ST.' became '123MAINST'
That takes care of the concern I raised.
Which lowered the average length of the data from 16 to 13. So I
was really looking for a 48% improvement...
But only an improvement for the scan processing, not for the
overall query processing. Even if measuring actual CPU-time versus
clock-time, the fixed-costs will skew downward any potential for
improvement.
The I\O is an effective fixed-cost for the table scan, increased
by a little more than 6MB in the given scenario for changing to add
the two-byte varying length. The CPU cost is near insignificant for
the comparisons against the non-matched bytes across the storage
which will already have been paged into memory; paged with similar
characteristics for both scenarios. Each will have similar CPU
costs for everything other than the text string scanning.
According to the percentages given, instead of looking for an
overall percent improvement, you are looking for an improvement only
with character scanning. Subtract from each test scenario, the
overall cost of selecting a single-byte comparison which will either
never match any row from or will match every row on the first byte;
a one-byte compare with no match includes the cost of doing all of
the I\O, but with the most minimally complex scan [compare of each
byte for length of data or field], and a one-byte compare where
every row will match on the first byte includes the cost of doing
all of the I\O while ensuring only one single-byte compare. To
prevent actual row selection, the query should be a SELECT COUNT(*)
summary so neither does any actual row\data retrieval. Finally,
review the percent difference between the two scenarios after
adjustment [for removal of the fixed I\O cost and an effective
minimal scan cost], for which an effective difference should be in
consideration of only the scanning costs.
<<SNIP>>
In my testing I am running each statement multiple times and
looking at the average of the last few runs. Though I haven't
switch the order as I'm running the statement over the fixed
length field first, so I'd expect if order mattered, the first
statement would be the one to suffer, not the second.
Averaging the last few of several runs, for each scenario, is
also a valid [and better than order reversal] approach to minimize
the impacts to the measurements. Switching order is mostly a trick
to expose any impact, and thus cost averaging across order switches
can be considerably more accurate; than if never having exposed the
flaw for only having performed the tests in one order.
<<SNIP>>
On Thu, Jul 22, 2010 at 1:07 PM, CRPence wrote:
Since a blank is not the prefix of the data being searched,
you could not see the maximum potential improvement you are
looking for.
Try searching on '% MAIN%' instead, to effect a better test;
not that I trust the performance test results for both
non-dedicated and without averaged repeated tests including
reversed order of any two compared scenarios.
As an Amazon Associate we earn from qualifying purchases.