[prev in list] [next in list] [prev in thread] [next in thread] 

List:       solr-user
Subject:    Re: ExtendedDismaxQParser changes
From:       elisabeth benoit <elisaelisaelisa () gmail ! com>
Date:       2024-05-07 14:30:16
Message-ID: CAG-A-8gi=Ro1=zRXdAQnSXmg5ZKne+sF6EsYSX3KPr4ey3HTww () mail ! gmail ! com
[Download RAW message or body]


For the record, I solved this problem by removing stop words in my analyzer
for wordfield.

We often get this problem where there is stop words discrepancies between
fields.

Le jeu. 16 nov. 2023 Ã  09:28, elisabeth benoit <elisaelisaelisa@gmail.com>
a écrit :

> 
> Thanks a lot for taking time to answer.
> 
> I'll have to figure out a work around, decreasing mm is not an option for
> me, maybe use a boost for this extra field.
> 
> Best regards,
> Elisabeth
> 
> Le mar. 14 nov. 2023 à 12:05, Mikhail Khludnev <mkhl@apache.org> a écrit :
> 
> > Ok. Right
> > (one two three four five six seven)~7 means match all of them ie in fact
> > +one
> > +two +three +four +five +six +seven
> > Here we can see that how dismax handles fields with different analyzers is
> > far from perfection.
> > You can either decrease mm
> > 
> > https://solr.apache.org/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Themm_MinimumShouldMatch_Parameter
> >  or experiment with mm.autoRelax=true
> > 
> > https://solr.apache.org/guide/6_6/the-extended-dismax-query-parser.html#TheExtendedDisMaxQueryParser-Themm.autoRelaxParameter
> >  
> > 
> > On Mon, Nov 13, 2023 at 10:33 PM elisabeth benoit <
> > elisaelisaelisa@gmail.com>
> > wrote:
> > 
> > > okay, thanks, for the answer. the thing is
> > > 
> > > when there is no *wordf**ield* in the *qf* param, but only *edgefield1*
> > and
> > > *edgefield2*, I get this parsedQuery
> > > 
> > > parsedQuery =
> > > +(DisjunctionMaxQuery(((edgefield1:musee)^1.1 | edgefield2:musee))
> > > DisjunctionMaxQuery(((edgefield1:maillol)^1.1 | edgefield2:maillol))
> > > DisjunctionMaxQuery(((edgefield1:61)^1.1 | edgefield2:61))
> > > DisjunctionMaxQuery(((edgefield1:r)^1.1 | edgefield2:r))
> > > DisjunctionMaxQuery(((edgefield1:grenelle)^1.1 | edgefield2:grenelle))
> > > DisjunctionMaxQuery(((edgefield1:75007)^1.1 | edgefield2:75007))
> > > DisjunctionMaxQuery(((edgefield1:paris)^1.1 | edgefield2:paris)))~7
> > > 
> > > and SolR does return documents
> > > 
> > > but when I have instead* wordf**ield* and *edgefield* in *qf*,  I get
> > this
> > > parsedQuery
> > > 
> > > parsedQuery =
> > > > "+DisjunctionMaxQuery((((wordfield:musee wordfield:maillol
> > wordfield:61
> > > > Synonym(wordfield:r wordfield:ru wordfield:rue) wordfield:grenelle
> > > > wordfield:75007 wordfield:paris)~7)^1.1 | ((edgefield:musee
> > > > edgefield:maillol
> > > > edgefield:61 edgefield:r edgefield:grenelle edgefield:75007
> > > > edgefield:paris)~7)))"
> > > 
> > > and SolR does not return any documents.
> > > 
> > > That is what makes me thing there is something wrong with the second
> > > parsedQuery.
> > > 
> > > Best regards,
> > > Elisabeth
> > > 
> > > 
> > > 
> > > Le lun. 13 nov. 2023 Ã  20:15, Mikhail Khludnev <mkhl@apache.org> a
> > écrit :
> > > 
> > > > > 
> > > > > the first case listed in my mail
> > > > > parsedQuery =
> > > > > "+DisjunctionMaxQuery((((wordfield:musee wordfield:maillol
> > > wordfield:61
> > > > > Synonym(wordfield:r wordfield:ru wordfield:rue) wordfield:grenelle
> > > > > wordfield:75007 wordfield:paris)~7)^1.1 | ((edgefield:musee
> > > > > edgefield:maillol
> > > > > edgefield:61 edgefield:r edgefield:grenelle edgefield:75007
> > > > > edgefield:paris)~7)))"
> > > > 
> > > > 
> > > > > The OR is different, it is all words must match wordfield OR all
> > words
> > > > must
> > > > > match edgefield, but no mix between the two fields are allowed.
> > > > 
> > > > 
> > > > It doesn't work this way. These two queries differs only in
> > > scoring/results
> > > > ordering. i.e
> > > > this query matches  docs: {wordfield:musee, edgefield:musee} as well
> > as {
> > > > wordfield:musee,edgefield:maillol},   {wordfield:musee}, {
> > > > edgefield:maillol}.
> > > > This explanation might be useful
> > > > https://lucidworks.com/post/solr-boolean-operators/
> > > > Note: DisMax works like OR/| but takes max instead of sum as a score.
> > > > 
> > > > On Mon, Nov 13, 2023 at 7:21 PM elisabeth benoit <
> > > > elisaelisaelisa@gmail.com>
> > > > wrote:
> > > > 
> > > > > Hello,
> > > > > 
> > > > > Thanks for your answer.
> > > > > 
> > > > > I mean that in the second case listed in my mail, the query is
> > > > > parsedQuery =
> > > > > +(DisjunctionMaxQuery(((edgefield1:musee)^1.1 | edgefield2:musee))
> > > > > DisjunctionMaxQuery(((edgefield1:maillol)^1.1 |
> > edgefield2:maillol))
> > > > > DisjunctionMaxQuery(((edgefield1:61)^1.1 | edgefield2:61))
> > > > > DisjunctionMaxQuery(((edgefield1:r)^1.1 | edgefield2:r))
> > > > > DisjunctionMaxQuery(((edgefield1:grenelle)^1.1 |
> > edgefield2:grenelle))
> > > > > DisjunctionMaxQuery(((edgefield1:75007)^1.1 | edgefield2:75007))
> > > > > DisjunctionMaxQuery(((edgefield1:paris)^1.1 | edgefield2:paris)))~7
> > > > > 
> > > > > and so the way I read it is "musee" can match edgefield1 OR
> > edgefield2,
> > > > > "maillol" can match edgefield1 OR edgefield2, and so on, so solr can
> > > > return
> > > > > a doc where some query words match with edgefield1 and some other
> > query
> > > > > words with edgefield2.
> > > > > 
> > > > > But in the first case listed in my mail
> > > > > 
> > > > > parsedQuery =
> > > > > "+DisjunctionMaxQuery((((wordfield:musee wordfield:maillol
> > > wordfield:61
> > > > > Synonym(wordfield:r wordfield:ru wordfield:rue) wordfield:grenelle
> > > > > wordfield:75007 wordfield:paris)~7)^1.1 | ((edgefield:musee
> > > > > edgefield:maillol
> > > > > edgefield:61 edgefield:r edgefield:grenelle edgefield:75007
> > > > > edgefield:paris)~7)))"
> > > > > 
> > > > > The OR is different, it is all words must match wordfield OR all
> > words
> > > > must
> > > > > match edgefield, but no mix between the two fields are allowed.
> > > > > 
> > > > > So I cannot search both fields at the same time.
> > > > > 
> > > > > I hope this is clear!
> > > > > 
> > > > > I would like to search both fields in same query.
> > > > > 
> > > > > Best regards,
> > > > > Elisabeth
> > > > > 
> > > > > Le lun. 13 nov. 2023 Ã  17:02, Mikhail Khludnev <mkhl@apache.org> a
> > > > écrit :
> > > > > 
> > > > > > Hello Elisabeth.
> > > > > > DisMax analyses user input across the given qf fields. If the
> > number
> > > of
> > > > > > resulting tokens are different it can't apply defaults logic - per
> > > word
> > > > > sum
> > > > > > over per field maximums; and flips to max over sums. The good
> > news is
> > > > > that
> > > > > > the difference between two approaches is only scoring.
> > > > > > WDYM exactly by absence of "matching words to be in two different
> > > > > fields"?
> > > > > > 
> > > > > > On Mon, Nov 13, 2023 at 5:01 PM elisabeth benoit <
> > > > > > elisaelisaelisa@gmail.com>
> > > > > > wrote:
> > > > > > 
> > > > > > > Hello,
> > > > > > > 
> > > > > > > I am using solr 7.3.1 with ExtendedDismaxQParser.
> > > > > > > 
> > > > > > > I have a edgengrams field and a normal text field. When I mix
> > those
> > > > two
> > > > > > in
> > > > > > > the same query, ie *qf=edgefield wordfield* and use option
> > > > > > *debugQuery=on*,
> > > > > > > I see that the parsedQuery is different, ie all words should
> > match
> > > > the
> > > > > > same
> > > > > > > field.
> > > > > > > 
> > > > > > > ie parsedQuery =
> > > > > > > 
> > > > > > > "+DisjunctionMaxQuery((((wordfield:musee wordfield:maillol
> > > > wordfield:61
> > > > > > > Synonym(wordfield:r wordfield:ru wordfield:rue)
> > wordfield:grenelle
> > > > > > > wordfield
> > > > > > > > 75007 wordfield:paris)~7)^1.1 | ((edgefield:musee
> > > edgefield:maillol
> > > > > > > edgefield:61 edgefield:r edgefield:grenelle edgefield:75007
> > > edgefield
> > > > > > > > paris)~7)))"
> > > > > > > 
> > > > > > > When instead I use two edgefields with *qf=**edgefield1
> > > **edgefield2*
> > > > > > > 
> > > > > > > parsedQuery =
> > > > > > > +(DisjunctionMaxQuery(((edgefield1:musee)^1.1 |
> > edgefield2:musee))
> > > > > > > DisjunctionMaxQuery(((edgefield1:maillol)^1.1 |
> > > edgefield2:maillol))
> > > > > > > DisjunctionMaxQuery(((edgefield1:61)^1.1 | edgefield2:61))
> > > > > > > DisjunctionMaxQuery(((edgefield1:r)^1.1 | edgefield2:r))
> > > > > > > DisjunctionMaxQuery(((edgefield1:grenelle)^1.1 |
> > > > edgefield2:grenelle))
> > > > > > > DisjunctionMaxQuery(((edgefield1:75007)^1.1 | edgefield2:75007))
> > > > > > > DisjunctionMaxQuery(((edgefield1:paris)^1.1 |
> > edgefield2:paris)))~7
> > > > > > > 
> > > > > > > In the second case, edismax allows matching words to be in two
> > > > > different
> > > > > > > fields, but not in first case.
> > > > > > > 
> > > > > > > Is there a way to have the same behaviour, ie case two, in all
> > > cases?
> > > > > > > 
> > > > > > > best regards,
> > > > > > > Elisabeth
> > > > > > > 
> > > > > > 
> > > > > > 
> > > > > > --
> > > > > > Sincerely yours
> > > > > > Mikhail Khludnev
> > > > > > 
> > > > > 
> > > > 
> > > > 
> > > > --
> > > > Sincerely yours
> > > > Mikhail Khludnev
> > > > 
> > > 
> > 
> > 
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > 
> 



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic