[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-user
Subject:    Re: QueryParser refactoring
From:       Erik Hatcher <erik () ehatchersolutions ! com>
Date:       2005-03-08 8:45:49
Message-ID: 76660b51560499b1f343a32fd7d99adc () ehatchersolutions ! com
[Download RAW message or body]

On Mar 8, 2005, at 2:29 AM, Morus Walter wrote:
> Erik Hatcher writes:
>>> Your changes look great in general, though I find some issues:
>>>
>>> 1) 'stop OR stop AND stop' where stop is a stopword gives a parse
>>> error:
>>> Encountered "<EOF>" at line 1, column 0.
>>> Was expecting one of:
>>>     <NOT> ...
>>> ...
>>
>> I think  you must have tried this in a transient state when I forgot 
>> to
>> check in some JavaCC generated files.  Try again.  This one now 
>> returns
>> an empty BooleanQuery.
>>
> ok.
> I'm a bit puzzled, since I called javacc myself, so generated files 
> should
> not matter, but if it's fixed, I don't care about what went wrong.

Let me know if there is still an issue, though I added this exact case 
to TestPrecedenceQueryParser and its currently working for me.

>>> 2) Single term queries using +/- flags are parse to a query without
>>> flag
>>> +a -> a
>>
>> Hmmm.... this is a debatable one.  It's returning a TermQuery in this
>> case for "a".  Is that appropriate?  Or should it return a 
>> BooleanQuery
>> with a single TermQuery as required?
>>
> I'd prefer, if query parser parses queries created by query.toString()
> to the same query. But that's just a nice to have.

It's also an impossibility to have.  Here's a simple example, take a 
Query that is equivalent to A OR B, .toString equals "A B", then parse 
that with the default operator set to AND and you'll get "+A + B".  I 
created a modified Query->String converter for my current day time 
project (as I use a String representation for the most recently used 
drop-down that is stored as a client-side cookie) that explicitly puts 
in "OR" between SHOULD BooleanClauses.

I still believe that we need to have some query-parser-specific way to 
build strings from Query objects, though I haven't thought through 
exactly how that should be designed.  For example, I'm building a very 
custom query parser for a client that looks nothing like QueryParser 
syntax.  It would be very nice to be able to turn a Query back around 
into their expression syntax.

>> I think having it optimized to a TermQuery makes the most sense.
>> Though, putting it in a BooleanQuery does make this next one 
>> simpler...
>>
>>> -a -> a
>>> While this doesn't make a difference for +a it's a bit strange for 
>>> -a,
>>> OTOH -a isn't a usable query anyway.
>>
>> Oops... yeah, you're right.  If its a single clause right now it
>> doesn't wrap in a BooleanQuery and thus does not take into account the
>> modifier +/-/NOT.   But as you say, this is a bogus query anyway.  I
>> guess the right thing to do is wrap both the +a query as above and the
>> -a query into a BooleanQuery with the modifier set appropriately.
>>
> Ok.
> The question how to handle BooleanQueries, that contain prohibited 
> terms
> only, is a question on it's own.
> In my fix I choose to silently drop these queries. Basically because 
> it's
> effectivly dropped during querying anyway.

Silently drop as in you removed them entirely from the resultant Query?

That'd be easy enough to add - but is that what we want to happen?  
Community, thoughts?

> In an application, I handled this by dropping the query and notifying 
> the
> user, that some part of the query could not be handled and was ignored.

How did your application notice that part of the query was dropped?

>>> 3) a OR NOT b parses to 'a -b' which is the same as 'a AND NOT b'
>>>    IMHO `a OR NOT b' should be `a OR (NOT b)' though lucene cannot
>>> search
>>>    that. Maybe it should raise an error...
>>
>> Actually it parses like this:
>>
>> 	a OR NOT b -> a -b
>> 	a AND NOT b -> +a -b
>>
>> So they are slightly different, though the effect will be the same.
>>
>>>    a OR NOT b AND c (parsed to a -(+b +c)) should IMHO be parsed to 
>>> `a
>>> (-b +c)'
>>
>> Ah, ok.... so NOT gets much higher precedence than I'm currently 
>> giving
>> it.  That might take me a while to achieve, but I'll give it a shot.
>>
> Great.

I've shifted my local parser grammar around some, and have broken other 
tests, but do have the NOT precedence working.  Here's a testSimple 
case that I broke by making NOT have higher precedence (I shifted where 
Modifiers are taken into account - before a Clause now):

	Query /+term -term term/ yielded /(+term) (-term) term/, expecting 
/+term -term term/

As you can see this is wrong and I have more work to do.  A OR NOT B 
now parses to A (-B) though, which I too now believe is a more correct 
(though invalid) interpretation.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic