× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



On Wednesday, June 10, 2009 10:43 AM Jim Franz wrote:

Looking for a list of fake names to edit for exclusion. Have already excluded
FRED FLINSTONE
MICKEY MOUSE
UNKNOWN (and several variations)
N/A
Googled for a list but mostly found name generators...
Anyone have a more complete list to share.

I would suggest generating a list of real names against which to test for a match, to test for /inclusion/ instead of /exclusion/. And to optionally assign a weight to the likelihood that any name is presumed legitimate, as part of a[n a]synchronous service if a[n obvious or exact] match is not found in the database of known names; a null weight could indicate the name has not been evaluated. Some services such as a phone directory lookup may serve as a database of presumed-to-be-legitimate names, instead of building a private version of such a database. FWiW I would probably process in batches, versus process upon input, for many applications; the application was not described in the original post [which I did not reply to directly since that message does not appear on the group].

From what I have read, there is an entire industry which does /data cleansing/ for names & addresses, and they may have [web] services available which use their proprietary databases and algorithms to make such decisions. Probably address, GIS, or some other correlative information is required to accompany any names. Most utilization of such services is presumably for what is often called /deduplication/, but perhaps there is some manner in which they diagnose fake\frivolous information as data that should be purged. With a legitimacy rating\weighting from an internal lookup, and possibly an internal software evaluation, better choices can be made [limit the amount of rows] for which data needs further human or paid cleansing action; noting of course the obvious loss of the ability to consistency check or de-duplication against any other presumed-valid rows in the database.

http://www.google.com/#hl=en&num=100&q="data+cleansing"+services
http://www.ibm.com/developerworks/webservices/library/ws-soa-infoserv3
http://www.google.com/#hl=en&num=100&q="data+cleansing"+"web+services";

Regards, Chuck

As an Amazon Associate we earn from qualifying purchases.

This thread ...

Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.