[prev in list] [next in list] [prev in thread] [next in thread] 

List:       sas-l
Subject:    Re: Searching Character Field for Nonexact Matches?
From:       Sigurd Hermansen <HERMANS1 () WESTAT ! COM>
Date:       2002-07-31 22:24:11
[Download RAW message or body]

The SQL query

select
max((1-(length(t1.searchStr)*spedis(t1.searchStr,t2.searchStr)/200)),0.1) as
score
from dsn1 as t1,dsn2 as t2
;

computes a 'similarity score' using the SPEDIS() function. Given a
sufficiently high score, a search program might return a question as to
whether that 'fuzzy hit' might match the true criteria. SI uses a similar
scoring method to test for misspellings in SAS programs. I adapted the
expression from a SI example.

Sig
-----Original Message-----
From: James, Steve [mailto:spj1@CDC.GOV]
Sent: Wednesday, July 31, 2002 10:26 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Searching Character Field for Nonexact Matches?


Dear SAS-L,

A colleague has a web application that allows the user to query a text field
of about 2000 characters.  Right now only an exact match search is allowed
so that if the user types "vaccine" only those entries that have the exact
term are returned (I believe it's done using the INDEX function).  What
would be nice is if an inexact match were made such that if a person typed
in the misspelled term "vacine" then the search would still return the same
records as before.

I know that there's an operator called "Sounds Like" but I don't think that
will work for this application since I'm trying to match just a part of the
entire character field.

I've thought that Text Miner might be the ultimate solution in that it might
allow even more complex decisions about matches to be made.  Whether that's
true or not and how to hook that up to a web application are other questions
I have.

I wondered if anyone had any suggestions that they might share.

Steve James
IT Specialist
National Immunization Program
Statistical Analysis Branch
Centers for Disease Control and Prevention
(404) 639-6041 (phone)
(404) 639-1728 (fax)
sjames@cdc.gov

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic