'Re: SynonymGraphFilter'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-user
Subject:    Re: SynonymGraphFilter
From:       baris.kazar () oracle ! com
Date:       2018-09-13 13:33:57
Message-ID: b098f216-120d-c273-271c-0017eb624082 () oracle ! com
[Download RAW message or body]

Thanks Michael. I think this clears my questions.

Best regards


On 9/12/18 8:23 PM, Michael Sokolov wrote:
> Usually one will either apply synonyms at index time or apply them at query
> time, but not both. I think the situation is that you will get most correct
> behavior, respecting synonym graph structure, with query time synonyms.
> 
> Index time synonyms may give better performance, but at the cost of some
> overlap along time positions that results from the need for flattening, as
> in the quote you provided. If you use only query time synonyms there is no
> need to flatten.
> 
> On Thu, Sep 13, 2018, 12:59 AM <baris.kazar@oracle.com> wrote:
> 
> > Any examples on the following note on the Javadocs at
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_6-5F4- \
> > 5F1_analyzers-2Dcommon_org_apache_lucene_analysis_synonym_SynonymGraphFilter.html& \
> > d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlU \
> > LCbaezrgocEvPhQkl4&m=jjVzb2BqmqJ8noR0AT4fAenDR5scVDEiq9sAcfDmSjM&s=S02bxwhpCKvLzibdipBlbNQUEcnYsXVBBIiOV2fUKNM&e=
> >  
> > 
> > Quoted from the above url:
> > 
> > */However, if you use this during indexing, you must follow it with
> > FlattenGraphFilter to squash tokens on top of one another like
> > SynonymFilter, because the indexer can't directly consume a graph. To
> > get fully correct positional queries when your synonym replacements are
> > multiple tokens, you should instead apply synonyms using this
> > TokenFilter at query time and translate the resulting graph to a
> > TermAutomatonQuery e.g. using TokenStreamToTermAutomatonQuery./*
> > 
> > End of quote
> > 
> > 
> > This will make the code really hard to maintain if we separate synonyms
> > based on the number of tokens.
> > 
> > Any suggestions please?
> > 
> > Best regards
> > 
> > 
> > 
> > 
> > On 9/11/18 1:45 PM, baris.kazar@oracle.com wrote:
> > > Mike,-
> > > 
> > > Great article, thanks for that; and i was exactly thinking about
> > > reverse mapping when
> > > 
> > > i was writing this question. i guess Lucene would be nicer to both
> > > mappings when one is called for or another parameter to activate this
> > > double mapping.
> > > 
> > > 
> > > My next question is: can a synonmy be separated by space ?
> > > 
> > > Next last question on this: should i repeat this both at index and
> > > query times?
> > > Best regards
> > > 
> > > On 9/11/18 1:39 PM, Michael McCandless wrote:
> > > > Try reading the blog post I wrote about token stream graphs?
> > > > 
> > > > 
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.mikemccandless.com_2012_0 \
> > 4_lucenes-2Dtokenstreams-2Dare-2Dactually.html&d=DwIBaQ&c=RoP1YumCXCgaWHvlZYR8PZh8 \
> > Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=dFW7hW4Pkle8VsJIr-2hnjRiyzutTBueNt4tylmWfGA&s=VmAivANEDBIW2o1yuPeArZ9TEaeUW33HDiwFFLRZMxU&e=
> > 
> > > > 
> > > > Mike McCandless
> > > > 
> > > > 
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.mikemccandless.com&d=DwIB \
> > aQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaez \
> > rgocEvPhQkl4&m=dFW7hW4Pkle8VsJIr-2hnjRiyzutTBueNt4tylmWfGA&s=UPmHXdrk9T2XCSkJrvxNMIqQo5Bducmp5rQRwpZ8UHo&e=
> > 
> > > > 
> > > > On Tue, Sep 11, 2018 at 1:35 PM, <baris.kazar@oracle.com> wrote:
> > > > 
> > > > > Any comments please?
> > > > > 
> > > > > Thanks
> > > > > 
> > > > > 
> > > > > On 9/10/18 5:07 PM, baris.kazar@oracle.com wrote:
> > > > > 
> > > > > > Any examples on this? i think it would be nice if Javadocs had an
> > > > > > example
> > > > > > on this:
> > > > > > 
> > > > > > However, if you use this during indexing, you must follow it with
> > > > > > FlattenGraphFilter to squash tokens on top of one another like
> > > > > > SynonymFilter, because the indexer can't directly consume a graph.
> > > > > > To get
> > > > > > fully correct positional queries when your synonym replacements are
> > > > > > multiple tokens, you should instead apply synonyms using this
> > > > > > TokenFilter
> > > > > > at query time and translate the resulting graph to a
> > > > > > TermAutomatonQuery
> > > > > > e.g. using TokenStreamToTermAutomatonQuery.
> > > > > > 
> > > > > > multiple tokens means: a synonym with multiple equivalents??
> > > > > > 
> > > > > > or does it mean a synonym with multiple words?
> > > > > > 
> > > > > > this is not clear to me.
> > > > > > 
> > > > > > Best regards
> > > > > > 
> > > > > > 
> > > > > > On 9/10/18 3:15 PM, baris.kazar@oracle.com wrote:
> > > > > > 
> > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.
> > > > > > > apache.org_core_6-5F4-5F1_analyzers-2Dcommon_org_apache_luce
> > > > > > > ne_analysis_synonym_SynonymGraphFilter.html&d=DwICaQ&c=RoP1Y
> > > > > > > umCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BK
> > > > > > > NeyLlULCbaezrgocEvPhQkl4&m=E2-7wwk3FgEU_ykuPnXNoOe0IIkgxivSa
> > > > > > > YV3p-2lGfY&s=guRDJ6HEg5JJkMQqdDVZkKs0gbuI7naZK2TUXFHN9w8&e=
> > > > > > > 
> > > > > > > Does this mean i dont have to repeat it in the search analyzer
> > > > > > > when i do
> > > > > > > this at indexing time?
> > > > > > > 
> > > > > > > Best regards
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > > > > 
> > > > > > 
> > > > > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > > > 
> > > > > 
> > > 
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > 
> > 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic