[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-user
Subject:    Re: Scoring Across Multiple Fields
From:       Michael Froh <msfroh () gmail ! com>
Date:       2020-01-27 22:17:21
Message-ID: CACcAGTTPJZqwu0P0Dhc3g2RL1bHOG0i4mvgoXSaGYLz7TeVhVg () mail ! gmail ! com
[Download RAW message or body]


Hi John,

A TermQuery produces a scorer that can compute similarity for a given term
value against a given field, in the context of the index, so as you say, it
produces a score for one field.

If you want to match a given term value across multiple fields, indeed you
could use a BooleanQuery with the TermQueries in SHOULD clauses. The
vanilla BooleanQuery produces a score which is the sum of all matching
clauses' scores (or at least that's the interpretation I get from reading
the source code of the explain() method in BooleanWeight).

You can also look into DisjunctionMaxQuery, which works like a disjunctive
BooleanQuery, but it returns the maximum score across matching clauses. The
idea here is that if, say, you're matching across title and body fields, a
title match may score higher (perhaps because it's been boosted). If you
sum the scores across fields, you're likely just inflating those title
matches even more (since a title match is probably highly correlated with a
body match). (The DisjunctionMaxQuery also has a an optional
"tieBreakerMultiplier" property that you can use to weight the scoring
somewhere between pure max and pure sum -- like "Use the maximum score,
plus 0.001 times the sum of the rest".)

Hope that helps,
Michael

On Mon, 27 Jan 2020 at 13:37, John Brown <brown.john@temple.edu> wrote:

> Hi,
>
> I have a question regarding how Lucene computes document similarities from
> field similarities.
>
> Lucene's scoring documentation mentions that scoring works on fields and
> combines the results to return documents. I'm assuming fields are given
> scores, and those scores are simply averaged to return the document score?
>
> If this is the case, then in order to incorporate multiple fields in my
> scoring, I would use multiple term queries that contain the same term, but
> target different fields, then I would simply put them in a boolean query,
> and search my index using this boolean query.
>
> Am I going about this in the correct way? Any clarification would be
> greatly appreciated.
>
> Thank you,
> John B
>


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic