'Re: Case Insensitive Matching in Solr/Lucene'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       solr-user
Subject:    Re: Case Insensitive Matching in Solr/Lucene
From:       Erick Erickson <erickerickson () gmail ! com>
Date:       2014-11-25 22:59:58
Message-ID: CAN4YXvd3gVSmvMWEETtEKbau14ATH4Tof-gscJqbhV6GSy3Vow () mail ! gmail ! com
[Download RAW message or body]

DocValues are restricted to certain types of untokenized fields,
specifically string, Trie* and UUID. So lowercasefilter is just not
even in the picture.

Furthermore, changing to DocValues requires completely re-indexing, so....

Best,
Erick

On Tue, Nov 25, 2014 at 1:26 PM, Shawn Heisey <apache@elyograg.org> wrote:
> On 11/25/2014 6:27 AM, Alexandre Rafalovitch wrote:
>> The usual solution is to have faceting using the other field (with
>> copyField). Usually it is because people want the original unmodified
>> version the string without tokenization (So, "United States of
>> America" instead of "united" "states" "america"). It sounds like your
>> case is a little different and you do want tokenized values, just not
>> lowercased.
>
> Something I've been wondering about related to facets.  This might be a
> tangent from the original issue, but it's somewhat related, so I'm
> asking it here.
>
> It's my understanding that DocValues have the same info as stored fields
> -- that is, the original value, completely unmodified by the analysis chain.
>
> It's also my understanding that DocValues get used for sorting and
> facets if they are present.
>
> If both of these assumptions/understandings are correct, then I would
> think that simply turning on DocValues for a field with the lowercase
> filter (and reindexing) would allow case-insensitive queries *plus*
> facets with the original unmodified and untokenized values.
>
> Have I got completely the wrong idea?  I haven't tested any of this.
>
> Thanks,
> Shawn
>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic