[prev in list] [next in list] [prev in thread] [next in thread]
List: lucene-user
Subject: Re: Scoring Across Multiple Fields
From: Michael Froh <msfroh () gmail ! com>
Date: 2020-01-27 22:17:21
Message-ID: CACcAGTTPJZqwu0P0Dhc3g2RL1bHOG0i4mvgoXSaGYLz7TeVhVg () mail ! gmail ! com
[Download RAW message or body]
Hi John,
A TermQuery produces a scorer that can compute similarity for a given term
value against a given field, in the context of the index, so as you say, it
produces a score for one field.
If you want to match a given term value across multiple fields, indeed you
could use a BooleanQuery with the TermQueries in SHOULD clauses. The
vanilla BooleanQuery produces a score which is the sum of all matching
clauses' scores (or at least that's the interpretation I get from reading
the source code of the explain() method in BooleanWeight).
You can also look into DisjunctionMaxQuery, which works like a disjunctive
BooleanQuery, but it returns the maximum score across matching clauses. The
idea here is that if, say, you're matching across title and body fields, a
title match may score higher (perhaps because it's been boosted). If you
sum the scores across fields, you're likely just inflating those title
matches even more (since a title match is probably highly correlated with a
body match). (The DisjunctionMaxQuery also has a an optional
"tieBreakerMultiplier" property that you can use to weight the scoring
somewhere between pure max and pure sum -- like "Use the maximum score,
plus 0.001 times the sum of the rest".)
Hope that helps,
Michael
On Mon, 27 Jan 2020 at 13:37, John Brown <brown.john@temple.edu> wrote:
> Hi,
>
> I have a question regarding how Lucene computes document similarities from
> field similarities.
>
> Lucene's scoring documentation mentions that scoring works on fields and
> combines the results to return documents. I'm assuming fields are given
> scores, and those scores are simply averaged to return the document score?
>
> If this is the case, then in order to incorporate multiple fields in my
> scoring, I would use multiple term queries that contain the same term, but
> target different fields, then I would simply put them in a boolean query,
> and search my index using this boolean query.
>
> Am I going about this in the correct way? Any clarification would be
> greatly appreciated.
>
> Thank you,
> John B
>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic