× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



I have just started reading about Encoded Vector Index (or EVI)
One of the points that seems to be stressed is cardinality as far as
the key is concerned.
The states of the USA is good cardinality as there are only 50 of them.
Timestamps are bad, because you can have up to a kajillion of them.
How do you determine what is good and what is bad?
For instance, a date field (or at least a numeric field which
represents a date), would that be considered a bad choice for
cardinality.
Cardinality - I hope I have the correct word

I've created them over fields with a much higher number of values. If you
look at the parameters for creation you will see that it can support
65,535 distinct values with a 4 byte code. Just be aware that the number
of possible values allowed will be based on the number of distinct values
in the file, unless you purposely force it to reserve the larger 4 byte
code size. This is one reason it's handy to remove the EVIs before large
updates to the underlying file--you risk exceeding the number of distinct
values for which the EVI was created.

Even if you don't exceed that number, it is still better to remove and add
back the EVIs when doing large updates. For a datamart we build on a
nightly basis, it is MUCH faster to remove the EVIs, rebuild the datamart
(a single, large table), then add back the EVIs than to update the file
while the EVIs are present.

As far as your date field goes, will you have more than 65,535 possible
values? I know what the standard IBM recommendation based on 50 state
codes, but with our little datamart we saw considerable speed improvement
when we added an EVI over the zip codes used in our sales history. Way
more than 50 fields, but still less than the 65k number.


Andrew Lopez
Systems Analyst



Phone:  803-714-2037


Email: ALopez@xxxxxxxxxx
Please consider the environment before printing this e-mail.


This message and any attachments should be treated as proprietary to the sender
and confidential to the identified recipients and should not be disclosed to or
used by anyone other than the intended recipient unless pre arranged with the
sender. If you are not an addressee of this communication, have received this
e-mail by transmission error of the sender, recipient or due to another
originator by an error in transmission, you are hereby notified that any
disclosure, copying, use, distribution, or taking of any action in reliance on
the contents of this information is prohibited. In any such event, please
notify the sender immediately by contacting Spirax Sarco Inc., 803 714 2000 or
reply to this e-mail and then delete it from your system. Spirax Sarco Inc.
accepts no responsibility for software viruses and all recipients should check
for viruses before opening any attachments.

As an Amazon Associate we earn from qualifying purchases.

This thread ...

Follow-Ups:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.