[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-dev
Subject:    Re: How to retain % sign against numbers in lucene indexing/ search
From:       Mikhail Khludnev <mkhl () apache ! org>
Date:       2023-07-12 16:16:39
Message-ID: CAF8TkC7fRhBLtmUg_15LZp1exzbbYMYDP0UxdWgDm4tj8o8bmQ () mail ! gmail ! com
[Download RAW message or body]

Hello Amitesh.
If StandardTokenizer does so (but it's worth to doublecheck on Solr Admin
Analysis screen), you can experiment with WhitespaceTokenizer.

On Wed, Jul 12, 2023 at 3:33 PM Amitesh Kumar <amitesh116@gmail.com> wrote:

> Hi Group,
>
> I am facing a requirement change to get % sign retained in searches. e.g
>
> Sample search docs:
> 1. Number of boys 50
> 2. My score was 50%
> 3. 40-50% for pass score
>
> Search query: 50%
> Expected results: Doc-2, Doc-3 i.e.
> My score was 50%
> 40-50% for pass score
>
> Actual result: All 4 documents
>
> On the implementation front, I am using a set of filters like
> lowerCaseFilter, EnglishPossessiveFilter etc in addition to base tokenizer
> StandardTokenizer.
>
> My analysis suggests, StandardTOkenizer strips off the %  sign and hence
> the behavior.Has someone faced similar requirements? Any help/guidance is
> highly appreciated.
>
> *Warm Regards,*
> *Amitesh  K*
>


-- 
Sincerely yours
Mikhail Khludnev

[Attachment #3 (text/html)]

<div dir="ltr">Hello Amitesh.  <div>If StandardTokenizer does so (but it&#39;s worth \
to doublecheck  on Solr Admin Analysis screen), you can experiment with \
WhitespaceTokenizer.    </div></div><br><div class="gmail_quote"><div dir="ltr" \
class="gmail_attr">On Wed, Jul 12, 2023 at 3:33 PM Amitesh Kumar &lt;<a \
href="mailto:amitesh116@gmail.com">amitesh116@gmail.com</a>&gt; \
wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div \
dir="ltr"><div dir="ltr">Hi Group,<div><br></div><div>I am facing a requirement \
change to get % sign retained in searches. e.g</div><div><br></div><div>Sample search \
docs:</div><div>1. Number of boys 50</div><div>2. My score was 50%</div><div>3. \
40-50% for pass score</div><div><br></div><div>Search query: 50%</div><div>Expected \
results: Doc-2, Doc-3 i.e.  </div><div>My score was 50%</div><div>40-50% for pass \
score<br></div><div><br></div><div>Actual result: All 4 \
documents</div><div><br></div><div>On the implementation front, I am using a set of \
filters  like lowerCaseFilter, EnglishPossessiveFilter etc in addition to base \
tokenizer StandardTokenizer.</div><div><br></div><div>My analysis suggests, \
StandardTOkenizer strips off the %   sign and hence the behavior.Has someone faced \
similar requirements? Any help/guidance is highly appreciated.</div><div><br \
clear="all"><div><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div \
dir="ltr"><div style="font-family:&quot;comic sans ms&quot;,sans-serif"><b><span \
style="font-family:arial,sans-serif;font-size:13px;border-collapse:collapse"><div \
style="text-align:left;font-family:&quot;comic sans \
ms&quot;,sans-serif;font-weight:normal"><span \
style="border-collapse:separate;font-size:small"><b></b></span></div><div \
style="font-family:&quot;comic sans ms&quot;,sans-serif;font-weight:normal"><span \
style="border-collapse:separate;font-size:small"><b>Warm \
Regards,</b></span></div></span></b></div><div style="font-family:&quot;comic sans \
ms&quot;,sans-serif"><b>Amitesh   \
K</b></div></div></div></div></div></div></div></div></div></div> \
</blockquote></div><br clear="all"><div><br></div><span \
class="gmail_signature_prefix">-- </span><br><div dir="ltr" \
class="gmail_signature"><div dir="ltr">Sincerely yours<br>Mikhail \
Khludnev<br></div></div>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic