[prev in list] [next in list] [prev in thread] [next in thread]
List: lucene-user
Subject: Re: QueryParser refactoring
From: Erik Hatcher <erik () ehatchersolutions ! com>
Date: 2005-03-08 8:45:49
Message-ID: 76660b51560499b1f343a32fd7d99adc () ehatchersolutions ! com
[Download RAW message or body]
On Mar 8, 2005, at 2:29 AM, Morus Walter wrote:
> Erik Hatcher writes:
>>> Your changes look great in general, though I find some issues:
>>>
>>> 1) 'stop OR stop AND stop' where stop is a stopword gives a parse
>>> error:
>>> Encountered "<EOF>" at line 1, column 0.
>>> Was expecting one of:
>>> <NOT> ...
>>> ...
>>
>> I think you must have tried this in a transient state when I forgot
>> to
>> check in some JavaCC generated files. Try again. This one now
>> returns
>> an empty BooleanQuery.
>>
> ok.
> I'm a bit puzzled, since I called javacc myself, so generated files
> should
> not matter, but if it's fixed, I don't care about what went wrong.
Let me know if there is still an issue, though I added this exact case
to TestPrecedenceQueryParser and its currently working for me.
>>> 2) Single term queries using +/- flags are parse to a query without
>>> flag
>>> +a -> a
>>
>> Hmmm.... this is a debatable one. It's returning a TermQuery in this
>> case for "a". Is that appropriate? Or should it return a
>> BooleanQuery
>> with a single TermQuery as required?
>>
> I'd prefer, if query parser parses queries created by query.toString()
> to the same query. But that's just a nice to have.
It's also an impossibility to have. Here's a simple example, take a
Query that is equivalent to A OR B, .toString equals "A B", then parse
that with the default operator set to AND and you'll get "+A + B". I
created a modified Query->String converter for my current day time
project (as I use a String representation for the most recently used
drop-down that is stored as a client-side cookie) that explicitly puts
in "OR" between SHOULD BooleanClauses.
I still believe that we need to have some query-parser-specific way to
build strings from Query objects, though I haven't thought through
exactly how that should be designed. For example, I'm building a very
custom query parser for a client that looks nothing like QueryParser
syntax. It would be very nice to be able to turn a Query back around
into their expression syntax.
>> I think having it optimized to a TermQuery makes the most sense.
>> Though, putting it in a BooleanQuery does make this next one
>> simpler...
>>
>>> -a -> a
>>> While this doesn't make a difference for +a it's a bit strange for
>>> -a,
>>> OTOH -a isn't a usable query anyway.
>>
>> Oops... yeah, you're right. If its a single clause right now it
>> doesn't wrap in a BooleanQuery and thus does not take into account the
>> modifier +/-/NOT. But as you say, this is a bogus query anyway. I
>> guess the right thing to do is wrap both the +a query as above and the
>> -a query into a BooleanQuery with the modifier set appropriately.
>>
> Ok.
> The question how to handle BooleanQueries, that contain prohibited
> terms
> only, is a question on it's own.
> In my fix I choose to silently drop these queries. Basically because
> it's
> effectivly dropped during querying anyway.
Silently drop as in you removed them entirely from the resultant Query?
That'd be easy enough to add - but is that what we want to happen?
Community, thoughts?
> In an application, I handled this by dropping the query and notifying
> the
> user, that some part of the query could not be handled and was ignored.
How did your application notice that part of the query was dropped?
>>> 3) a OR NOT b parses to 'a -b' which is the same as 'a AND NOT b'
>>> IMHO `a OR NOT b' should be `a OR (NOT b)' though lucene cannot
>>> search
>>> that. Maybe it should raise an error...
>>
>> Actually it parses like this:
>>
>> a OR NOT b -> a -b
>> a AND NOT b -> +a -b
>>
>> So they are slightly different, though the effect will be the same.
>>
>>> a OR NOT b AND c (parsed to a -(+b +c)) should IMHO be parsed to
>>> `a
>>> (-b +c)'
>>
>> Ah, ok.... so NOT gets much higher precedence than I'm currently
>> giving
>> it. That might take me a while to achieve, but I'll give it a shot.
>>
> Great.
I've shifted my local parser grammar around some, and have broken other
tests, but do have the NOT precedence working. Here's a testSimple
case that I broke by making NOT have higher precedence (I shifted where
Modifiers are taken into account - before a Clause now):
Query /+term -term term/ yielded /(+term) (-term) term/, expecting
/+term -term term/
As you can see this is wrong and I have more work to do. A OR NOT B
now parses to A (-B) though, which I too now believe is a more correct
(though invalid) interpretation.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic