[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openmrs-dev
Subject:    Re: TRUNK-3835: Search for "of" in concept word
From:       Rafal Korytkowski <rafal () openmrs ! org>
Date:       2012-12-31 9:54:04
Message-ID: CAKNp=UOXS2NXyeexBgAeWX+bJwUtfXg07GoUZJjwCvngfz7Cng () mail ! gmail ! com
[Download RAW message or body]

ConceptName is indexed with StandardAnalyzer, which skips basic English
stop words. A query parser created by LuceneQuery should be using the same
analyzer and also removing stop words, but apparently it behaves
differently.

A possible quick fix is
http://stackoverflow.com/questions/12192879/lucene-stop-word-filter

In the long run we want to have our custom analyzer/stop filter which will
skip stop words based on a locale.


-Rafał


On 31 December 2012 10:21, Gurpreet Luthra <gluthra@thoughtworks.com> wrote:

> Thanks Rafal. This helped. I have been able to setup the hibernate search
> branch, and also reproduce the issue.
> 
> On closer inspection it seems that the "Stop Words" are not being removed
> from the phrase that is constructed for Lucene Search. Here is how the
> query looks:
> 
> +name:( +(tuberculo~ tuberculo*^2) +(of~ of*^2) "tuberculo of"^1000)
> +locale:(en* ) +voided:false
> 
> If I filter out the "OF" word from the phrase in the
> "newRequirePartialWordsSearchPhrase( )" method of HibernateConceptDAO.java,
> then it works fine for "Tuberculosis Of" search.
> 
> Do you have any inputs on what might be a good fix. The stop words are
> locale specific, and hence I guess I will need to look at the Search Locale
> and extract stop words of that locale, and then remove them from the search
> phrase. That could be one approach.
> 
> On reading briefly about Hibernate Search, there seems to be the concept
> of "Analyzers" which have filters for Stop Words. Could something like this
> be used?
> 
> http://docs.jboss.org/hibernate/search/4.2/reference/en-US/html/search-mapping.html#d0e3461(Search \
> for "StopFilterFactory" on this page). 
> Any recommendations if you (or someone) already has some thoughts on a
> recommended strategy for fixing this. It would help.
> 
> Thank you.
> 
> 
> On Sun, Dec 30, 2012 at 9:30 PM, Rafal Korytkowski <rafal@openmrs.org>wrote:
> 
> > Thanks for your interest in working on this branch. It's indeed a new
> > take on concept searching. We decided to use hibernate-search [0], which
> > couples Hibernate with Lucene in a very seamless way.
> > 
> > You can easily see modifications in core at [1]. It's a good starting
> > point to see how things have changed and where a fix may go. I'd recommend
> > to try this branch using the MVP dictionary (it's packaged with a
> > standalone for instance). Unfortunately the demo server [2] seems to be
> > down at the moment so you can't see it live right away.
> > 
> > The work has been discussed during one of the recent dev calls. You can
> > find notes and a recording at
> > https://wiki.openmrs.org/display/RES/2012-12-06+Developers+Forum
> > 
> > If you have any further questions, please let me know.
> > 
> > [0] - http://www.hibernate.org/subprojects/search.html
> > [1] - https://github.com/openmrs/openmrs-core/pull/126/files
> > [2] - http://gw65.iu.xsede.org:8080/openmrs-hibernate-search
> > 
> > 
> > -Rafał
> > 
> > 
> > On 29 December 2012 11:22, Gurpreet Luthra <gluthra@thoughtworks.com>wrote:
> > 
> > > Thanks Rafal. After reading your reply, I realized the Fix version says
> > > "hibernate-search".
> > > 
> > > Is there some reading material on this branch? From my searches on net
> > > and JIRA, it seems OpenMRS is reimplementing concept search (using
> > > Hibernate/Lucene?). Can you shed some more light on this?
> > > 
> > > I will try out this branch soon. Thanks!
> > > 
> > > 
> > > On Fri, Dec 28, 2012 at 7:54 PM, Rafal Korytkowski <rafal@openmrs.org>wrote:
> > > 
> > > > Hi Gurpreet,
> > > > 
> > > > This issue affects the hibernate-search branch and a fix also needs to
> > > > be made in the hibernate-search branch. It's not in the master branch that
> > > > you get right after you clone the openmrs-core repository rather you need
> > > > to checkout hibernate-search.
> > > > 
> > > > The branch can be previewed at
> > > > https://github.com/openmrs/openmrs-core/tree/hibernate-search
> > > > 
> > > > 
> > > > -Rafał
> > > > 
> > > > 
> > > > On 28 December 2012 14:20, Gurpreet Luthra <gluthra@thoughtworks.com>wrote:
> > > > 
> > > > > Hello,
> > > > > 
> > > > > I was looking at High priority issues to be picked up for development,
> > > > > and came upon this issue:
> > > > > https://tickets.openmrs.org/browse/TRUNK-3835 (Fix searching of "of"
> > > > > in concept word).
> > > > > 
> > > > > I haven't been able to reproduce it. I searched in Concept dictionary
> > > > > using the word "of" and it worked fine. The word "of" was already in my
> > > > > "Stop Words" in OpenMRS, so I guess that's why it works ok.
> > > > > 
> > > > > The issue says that we should add "of" to default list. That part I
> > > > > didn't understand. "Of" was already in the default list that I installed.
> > > > > Did that happen because I imported DEMO DB?
> > > > > 
> > > > > What needs to be done for that issue? Or is it already fixed?
> > > > > 
> > > > > I wasn't sure if I should post this question in JIRA or here in
> > > > > mailing list. I felt that since I had no idea about the issue, maybe
> > > > > posting it here is better -- and then if the mail results in some clarity,
> > > > > I will update the issue with comments.
> > > > > 
> > > > > Thank you!
> > > > > --
> > > > > Regards
> > > > > Gurpreet
> > > > > (M) +91-7798987124
> > > > > 
> > > > > Join the Beach & Volunteer \
> > > > > Space<https://my.thoughtworks.com/groups/the-beach-and-volunteer-program> \
> > > > > to help OpenMRS, RapidFTR and Camfed.
> > > > > We are looking for Volunteers!
> > > > > 
> > > > > --
> > > > > OpenMRS Developers: http://go.openmrs.org/dev
> > > > > Post: dev@openmrs.org
> > > > > Unsubscribe: dev+unsubscribe@openmrs.org
> > > > > Manage your OpenMRS subscriptions at https://id.openmrs.org/
> > > > > 
> > > > > 
> > > > > 
> > > > 
> > > > --
> > > > OpenMRS Developers: http://go.openmrs.org/dev
> > > > Post: dev@openmrs.org
> > > > Unsubscribe: dev+unsubscribe@openmrs.org
> > > > Manage your OpenMRS subscriptions at https://id.openmrs.org/
> > > > 
> > > > 
> > > > 
> > > 
> > > 
> > > 
> > > --
> > > Regards
> > > Gurpreet
> > > (M) +91-7798987124
> > > 
> > > Join the Beach & Volunteer \
> > > Space<https://my.thoughtworks.com/groups/the-beach-and-volunteer-program> to \
> > > help OpenMRS, RapidFTR and Camfed. We are looking for Volunteers!
> > > 
> > > --
> > > OpenMRS Developers: http://go.openmrs.org/dev
> > > Post: dev@openmrs.org
> > > Unsubscribe: dev+unsubscribe@openmrs.org
> > > Manage your OpenMRS subscriptions at https://id.openmrs.org/
> > > 
> > > 
> > > 
> > 
> > --
> > OpenMRS Developers: http://go.openmrs.org/dev
> > Post: dev@openmrs.org
> > Unsubscribe: dev+unsubscribe@openmrs.org
> > Manage your OpenMRS subscriptions at https://id.openmrs.org/
> > 
> > 
> > 
> 
> 
> 
> --
> Regards
> Gurpreet
> (M) +91-7798987124
> 
> Join the Beach & Volunteer \
> Space<https://my.thoughtworks.com/groups/the-beach-and-volunteer-program> to help \
> OpenMRS, RapidFTR and Camfed. We are looking for Volunteers!
> 
> --
> OpenMRS Developers: http://go.openmrs.org/dev
> Post: dev@openmrs.org
> Unsubscribe: dev+unsubscribe@openmrs.org
> Manage your OpenMRS subscriptions at https://id.openmrs.org/
> 
> 
> 

-- 
OpenMRS Developers: http://go.openmrs.org/dev
Post: dev@openmrs.org
Unsubscribe: dev+unsubscribe@openmrs.org
Manage your OpenMRS subscriptions at https://id.openmrs.org/


[Attachment #3 (text/html)]

<div dir="ltr">ConceptName is indexed with StandardAnalyzer, which skips basic \
English stop words. A query parser created by LuceneQuery should be using the same \
analyzer and also removing stop words, but apparently it behaves differently.<div>

<br></div><div style>A possible quick fix is  <a \
href="http://stackoverflow.com/questions/12192879/lucene-stop-word-filter">http://stackoverflow.com/questions/12192879/lucene-stop-word-filter</a></div><div \
style><br></div>

<div style>In the long run we want to have our custom analyzer/stop filter which will \
skip stop words based on a locale.</div></div><div class="gmail_extra"><br \
clear="all"><div><br>-Rafał</div> <br><br><div class="gmail_quote">On 31 December \
2012 10:21, Gurpreet Luthra <span dir="ltr">&lt;<a \
href="mailto:gluthra@thoughtworks.com" \
target="_blank">gluthra@thoughtworks.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">

Thanks Rafal. This helped. I have been able to setup the hibernate search branch, and \
also reproduce the issue.  <div><br></div><div>On closer inspection it seems that the \
&quot;Stop Words&quot; are not being removed from the phrase that is constructed for \
Lucene Search. Here is how the query looks:</div>



<div><br></div><div>  +name:( +(tuberculo~ tuberculo*^2) +(of~ of*^2) &quot;tuberculo \
of&quot;^1000) +locale:(en* ) +voided:false<br></div><div><br></div><div>If I filter \
out the &quot;OF&quot; word from the phrase in the \
&quot;newRequirePartialWordsSearchPhrase( )&quot; method of HibernateConceptDAO.java, \
then it works fine for &quot;Tuberculosis Of&quot; search.  </div>



<div><br></div><div>Do you have any inputs on what might be a good fix. The stop \
words are locale specific, and hence I guess I will need to look at the Search Locale \
and extract stop words of that locale, and then remove them from the search phrase. \
That could be one approach.  </div>



<div><br></div><div>On reading briefly about Hibernate Search, there seems to be the \
concept of &quot;Analyzers&quot; which have filters for Stop Words. Could something \
like this be used?</div><div><a \
href="http://docs.jboss.org/hibernate/search/4.2/reference/en-US/html/search-mapping.html#d0e3461" \
target="_blank">http://docs.jboss.org/hibernate/search/4.2/reference/en-US/html/search-mapping.html#d0e3461</a> \
(Search for &quot;StopFilterFactory&quot; on this page).<br>



</div><div><br></div><div>Any recommendations if you (or someone) already has some \
thoughts on a recommended strategy for fixing this. It would \
help.</div><div><br></div><div>Thank you.  </div><div class="HOEnZb"><div class="h5">

<div class="gmail_extra"><br>

<br><div class="gmail_quote">On Sun, Dec 30, 2012 at 9:30 PM, Rafal Korytkowski <span \
dir="ltr">&lt;<a href="mailto:rafal@openmrs.org" \
target="_blank">rafal@openmrs.org</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">



<div dir="ltr">Thanks for your interest in working on this branch. It&#39;s indeed a \
new take on concept searching. We decided to use hibernate-search [0], which couples \
Hibernate with Lucene in a very seamless way.<div>


<br>


</div><div>You can easily see modifications in core at [1]. It&#39;s a good starting \
point to see how things have changed and where a fix may go. I&#39;d recommend to try \
this branch using the MVP dictionary (it&#39;s packaged with a standalone for \
instance).  Unfortunately  the demo server [2] seems to be down at the moment so you \
can&#39;t see it live right away.</div>





<div><br></div><div>The work has been discussed during one of the recent dev calls. \
You can find notes and a recording at  <a \
href="https://wiki.openmrs.org/display/RES/2012-12-06+Developers+Forum" \
target="_blank">https://wiki.openmrs.org/display/RES/2012-12-06+Developers+Forum</a></div>






<div><br></div><div>If you have any further questions, please let me \
know.</div><div><div><br></div><div>[0] -  <a \
href="http://www.hibernate.org/subprojects/search.html" \
target="_blank">http://www.hibernate.org/subprojects/search.html</a></div>





</div><div>[1] -  <a href="https://github.com/openmrs/openmrs-core/pull/126/files" \
target="_blank">https://github.com/openmrs/openmrs-core/pull/126/files</a></div><div>[2] \
-  <a href="http://gw65.iu.xsede.org:8080/openmrs-hibernate-search" \
target="_blank">http://gw65.iu.xsede.org:8080/openmrs-hibernate-search</a></div>





</div><div class="gmail_extra"><br clear="all"><div><br>-Rafał</div><div><div>
<br><br><div class="gmail_quote">On 29 December 2012 11:22, Gurpreet Luthra <span \
dir="ltr">&lt;<a href="mailto:gluthra@thoughtworks.com" \
target="_blank">gluthra@thoughtworks.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">





Thanks Rafal. After reading your reply, I realized the Fix version says \
&quot;hibernate-search&quot;.  <div><br></div><div>Is there some reading material on \
this branch? From my searches on net and JIRA, it seems OpenMRS is reimplementing \
concept search (using Hibernate/Lucene?). Can you shed some more light on this?</div>







<div><br></div><div>I will try out this branch soon. Thanks!  </div><div><div><div \
class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Dec 28, 2012 at 7:54 PM, \
Rafal Korytkowski <span dir="ltr">&lt;<a href="mailto:rafal@openmrs.org" \
target="_blank">rafal@openmrs.org</a>&gt;</span> wrote:<br>







<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div dir="ltr">Hi Gurpreet,<div><br></div><div>This issue \
affects the hibernate-search branch and a fix also needs to be made in the \
hibernate-search branch. It&#39;s not in the master branch that you get right after \
you clone the openmrs-core repository rather you need to checkout \
hibernate-search.</div>









<div><br></div><div>The branch can be previewed at <a \
href="https://github.com/openmrs/openmrs-core/tree/hibernate-search" \
target="_blank">https://github.com/openmrs/openmrs-core/tree/hibernate-search</a></div></div><div \
class="gmail_extra">









<br clear="all"><div><br>-Rafał</div>
<br><br><div class="gmail_quote"><div><div>On 28 December 2012 14:20, Gurpreet Luthra \
<span dir="ltr">&lt;<a href="mailto:gluthra@thoughtworks.com" \
target="_blank">gluthra@thoughtworks.com</a>&gt;</span> wrote:<br>

</div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px \
#ccc solid;padding-left:1ex"><div><div>

Hello,  <div><br></div><div>I was looking at High priority issues to be picked up for \
development, and came upon this issue:  </div><div><a \
href="https://tickets.openmrs.org/browse/TRUNK-3835" \
target="_blank">https://tickets.openmrs.org/browse/TRUNK-3835</a> (Fix searching of \
&quot;of&quot; in concept word).</div>











<div><br></div><div>I haven&#39;t been able to reproduce it. I searched in Concept \
dictionary using the word &quot;of&quot; and it worked fine. The word &quot;of&quot; \
was already in my &quot;Stop Words&quot; in OpenMRS, so I guess that&#39;s why it \
works ok.  </div>











<div><br></div><div>The issue says that we should add &quot;of&quot; to default list. \
That part I didn&#39;t understand. &quot;Of&quot; was already in the default list \
that I installed. Did that happen because I imported DEMO DB?  </div>











<div><br></div><div>What needs to be done for that issue? Or is it already fixed?  \
</div><div><br></div><div>I wasn&#39;t sure if I should post this question in JIRA or \
here in mailing list. I felt that since I had no idea about the issue, maybe posting \
it here is better -- and then if the mail results in some clarity, I will update the \
issue with comments.<br clear="all">











<div><br></div><div>Thank you!  </div><span><font color="#888888">-- <br><span \
style="color:rgb(102,102,102)">Regards</span><div><font \
color="#666666">Gurpreet</font></div><div><font color="#666666">(M)  \
+91-7798987124</font></div>









<div><font color="#666666"><br>

</font></div><div><span style="color:rgb(102,102,102)">Join the  </span><a \
href="https://my.thoughtworks.com/groups/the-beach-and-volunteer-program" \
target="_blank">Beach &amp; Volunteer Space</a><span style="color:rgb(102,102,102)">  \
to help OpenMRS, RapidFTR and Camfed.  </span></div>











<div><span style="color:rgb(102,102,102)">We are looking for Volunteers!</span></div>
</font></span></div></div></div><span><font color="#888888"><span><font \
color="#888888">

<p></p>

-- <br>
OpenMRS Developers: <a href="http://go.openmrs.org/dev" \
                target="_blank">http://go.openmrs.org/dev</a><br>
Post: <a href="mailto:dev@openmrs.org" target="_blank">dev@openmrs.org</a><br>
Unsubscribe: <a href="mailto:dev%2Bunsubscribe@openmrs.org" \
target="_blank">dev+unsubscribe@openmrs.org</a><br> Manage your OpenMRS subscriptions \
at <a href="https://id.openmrs.org/" target="_blank">https://id.openmrs.org/</a><br>  \
<br>  <br>
</font></span></font></span></blockquote></div><span><font \
color="#888888"><br></font></span></div><span><font color="#888888">

<p></p>

-- <br>
OpenMRS Developers: <a href="http://go.openmrs.org/dev" \
                target="_blank">http://go.openmrs.org/dev</a><br>
Post: <a href="mailto:dev@openmrs.org" target="_blank">dev@openmrs.org</a><br>
Unsubscribe: <a href="mailto:dev%2Bunsubscribe@openmrs.org" \
target="_blank">dev+unsubscribe@openmrs.org</a><br> Manage your OpenMRS subscriptions \
at <a href="https://id.openmrs.org/" target="_blank">https://id.openmrs.org/</a><br>  \
<br>  <br>
</font></span></blockquote></div><br><br clear="all"><div><br></div>-- <br><span \
style="color:rgb(102,102,102)">Regards</span><div><font \
color="#666666">Gurpreet</font></div><div><font color="#666666">(M)  \
+91-7798987124</font></div>







<div><font color="#666666"><br></font></div><div><span \
style="color:rgb(102,102,102)">Join the  </span><a \
href="https://my.thoughtworks.com/groups/the-beach-and-volunteer-program" \
target="_blank">Beach &amp; Volunteer Space</a><span style="color:rgb(102,102,102)">  \
to help OpenMRS, RapidFTR and Camfed.  </span></div>







<div><span style="color:rgb(102,102,102)">We are looking for Volunteers!</span></div>
</div>

<p></p>

-- <br>
OpenMRS Developers: <a href="http://go.openmrs.org/dev" \
                target="_blank">http://go.openmrs.org/dev</a><br>
Post: <a href="mailto:dev@openmrs.org" target="_blank">dev@openmrs.org</a><br>
Unsubscribe: <a href="mailto:dev%2Bunsubscribe@openmrs.org" \
target="_blank">dev+unsubscribe@openmrs.org</a><br> Manage your OpenMRS subscriptions \
at <a href="https://id.openmrs.org/" target="_blank">https://id.openmrs.org/</a><br>  \
<br>  <br>
</div></div></blockquote></div><br></div></div></div><div><div>

<p></p>

-- <br>
OpenMRS Developers: <a href="http://go.openmrs.org/dev" \
                target="_blank">http://go.openmrs.org/dev</a><br>
Post: <a href="mailto:dev@openmrs.org" target="_blank">dev@openmrs.org</a><br>
Unsubscribe: <a href="mailto:dev%2Bunsubscribe@openmrs.org" \
target="_blank">dev+unsubscribe@openmrs.org</a><br> Manage your OpenMRS subscriptions \
at <a href="https://id.openmrs.org/" target="_blank">https://id.openmrs.org/</a><br>  \
<br>  <br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><span \
style="color:rgb(102,102,102)">Regards</span><div><font \
color="#666666">Gurpreet</font></div><div><font color="#666666">(M)  \
+91-7798987124</font></div>



<div><font color="#666666"><br></font></div><div><span \
style="color:rgb(102,102,102)">Join the  </span><a \
href="https://my.thoughtworks.com/groups/the-beach-and-volunteer-program" \
target="_blank">Beach &amp; Volunteer Space</a><span style="color:rgb(102,102,102)">  \
to help OpenMRS, RapidFTR and Camfed.  </span></div>



<div><span style="color:rgb(102,102,102)">We are looking for Volunteers!</span></div>
</div>

<p></p>

-- <br>
OpenMRS Developers: <a href="http://go.openmrs.org/dev" \
                target="_blank">http://go.openmrs.org/dev</a><br>
Post: <a href="mailto:dev@openmrs.org" target="_blank">dev@openmrs.org</a><br>
Unsubscribe: <a href="mailto:dev%2Bunsubscribe@openmrs.org" \
target="_blank">dev+unsubscribe@openmrs.org</a><br> Manage your OpenMRS subscriptions \
at <a href="https://id.openmrs.org/" target="_blank">https://id.openmrs.org/</a><br>  \
<br>  <br>
</div></div></blockquote></div><br></div>

<p></p>

-- <br />
OpenMRS Developers: <a \
                href="http://go.openmrs.org/dev">http://go.openmrs.org/dev</a><br />
Post: dev@openmrs.org<br />
Unsubscribe: dev+unsubscribe@openmrs.org<br />
Manage your OpenMRS subscriptions at <a \
href="https://id.openmrs.org/">https://id.openmrs.org/</a><br /> &nbsp;<br />
&nbsp;<br />



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic