[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-core-devel
Subject:    Re: Review Request: Make Sonnet::Speller::setLanguage threadsafe, fixing segfaults in krunner
From:       "Simeon Bird" <bladud () gmail ! com>
Date:       2012-09-11 15:06:33
Message-ID: 20120911150633.2895.13760 () vidsolbach ! de
[Download RAW message or body]

> On Sept. 6, 2012, 12:29 p.m., David Faure wrote:
> > A number of comments on the implementation, but also a more general comment: does \
> > it even make sense to use a Speller in multiple threads, and to  change the \
> > language from one thread while another one is using the speller for \
> > spell-checking? It sounds like, even if a mutex/lock can prevent crashes, this is \
> > a weird thing to do anyway, since you have no idea at which point the \
> > spell-checking will switch to another language... could happen in the very middle \
> > of a sentence... 
> > Maybe it would make more sense to "post" the language change operation to the \
> > spell-checking thread using the same mechanism as the one used to "post" \
> > spellchecking requests to it? (Disclaimer: I know nothing of the krunner \
> > architecture).
> 
> Simeon Bird wrote:
> Thanks for the review. 
> 
> I agree that this is a slightly weird thing to do, but, as you surmise, it was the \
> only way I could think of to make the feature work properly.  
> As far as I understand it (from reading the documentation), the way krunner works \
> is to call Plasma::AbstractRunner::match in a new thread for every runner every \
> time input is entered.  
> So if I enter:
> 
> "spell hello"
> 
> I will be called with 's' 'sp' 'spe'...and so on until I get the whole match, \
> without waiting for the previous threads to return. Thus match has to be \
> thread-safe. 
> The feature I'm trying to fix here is to be able to enter "spell french bonjour" \
> and have it spell something.  The runner is responsible for parsing the input and \
> checking whether it should do anything else. So we don't actually post \
> spellchecking requests, we just post some input, and detect that we should \
> spell-check, and change the language, by parsing strings. So I couldn't see a way \
> to do this except to call Sonnet::Speller::setLanguage() in \
> Plasma::AbstractRunner::match,  which meant that for it to work I had to make match \
> thread-safe.  
> Sorry. I'm open to suggestions if you have a better idea. 
> 
> I've fixed your other comments in v2 of the diff, let me know if there is anything \
> else. 
> Thomas Lübking wrote:
> would it not make more sense to just kill the pending (and now worthless?) thread \
> instead (no idea how expensive this spellchecking is to be threaded) to not waste \
> cpu on a result that will be ignored anyway? 
> David Faure wrote:
> Forcefully killing threads (QThread::terminate()) is a big no-no, it leaks memory, \
> potentially worse: it can leave a mutex locked, so any further thread trying to \
> lock it, will hang for ever. 
> The only way to terminate a thread is to ask it nicely to terminate from within \
> (i.e. to exit run()), which means it has to finish what it's currently doing, \
> first, before it looks up for the next command or posted event which tells it to \
> quit. 
> Thomas Lübking wrote:
> That is generally true and i admit to not know how the plasma spell checker is \
> implemented (or even "where" ;-), but would expect some sort of ro non-joinable \
> threading, operating on shared data (as the only purpose seems to prevent blocking \
> the main event loop - not running two checks at the same  time and merge their \
> results)  ... or does QThread alone bring structures to forbid such approach (The \
> API doc does not mention inevitable leaks for QThread itself on ::terminate())? 
> (PS: I just start to wonder whether the speller threads are then tagged to catch \
> 2,1 exits...) 
> Simeon Bird wrote:
> I don't entirely follow what you mean in the last comment; what do you mean 'tagged \
> to catch 2,1 exits'? 
> The spellchecking runner is, I think, very cheap compared to some of the other \
> runners (eg, the nepomuk search), all of which are called at once. 
> But, be that as it may, the spell-check has no control over the threading model of \
> krunner, which it just inherits from Plasma::AbstractRunner, and changing that \
> model for everything seems a bit overkill for this, even if we understood fully how \
> it works (I certainly don't; I just took the documentation at face value, which may \
> or may not be a good idea). 
> A reasonable alternative, if you don't like doing the commit in principle (the \
> mutex might well slow down the spell-check), is to just remove the problematic \
> feature (which has never worked anyway). If krunner only spell-checked words in the \
> default dictionary, we wouldn't need to call setLanguage(), and everything would be \
> fine, as it was before 2010. 
> Thomas Lübking wrote:
> Thread execution is not predictable so in general it's possible that you start a \
> thread (1), then start another thread (2) and thread 2 finishes before thread 1 \
> Since i assume that in this case the last exiting thread will determine the sanity \
> of the token, this would eg. mean that if you spellcheck "threa" and then "thread" \
> but "threa" finishes last, the token "thread" would be determined misspelled. 
> A common approach to handle this is to "tag" (basically with an increment) the \
> threads/results so that if you get a result from thread "1" and already got a \
> result from thread "2" before, you know that the result from thread "1" is already \
> irrelevant and simply drop it instead of polluting your result. 
> I have no particular opinion on this patch or the krunner spell checker in general \
> (except assuming that threading it will likely be overhead anyway - esp. if one \
> sequentially navigated through a tree structure - and given the type rate a normal \
> human being can achieve ;-)

Ah, I see. This is getting somewhat educational for me :)


- Simeon


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/106242/#review18623
-----------------------------------------------------------


On Sept. 9, 2012, 10:06 p.m., Simeon Bird wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/106242/
> -----------------------------------------------------------
> 
> (Updated Sept. 9, 2012, 10:06 p.m.)
> 
> 
> Review request for kdelibs and Plasma.
> 
> 
> Description
> -------
> 
> Krunner's spellcheck plugin has been pretty broken since \
> bd291d21f096a714a171e7af3a534ba345ca5659 (about two years ago) because it called \
> Sonnet::Speller::setLanguage every time the spellchecker was invoked, which was \
> (very much) not thread-safe. 
> This patch makes Sonnet::Speller::setLanguage threadsafe by protecting all access \
> to the internal dict pointer using QReadWriteLock.  
> A related review request is 106244, which adds more fixes to the spellcheck \
> feature. 
> 
> This addresses bugs 264779 and 303831.
> http://bugs.kde.org/show_bug.cgi?id=264779
> http://bugs.kde.org/show_bug.cgi?id=303831
> 
> 
> Diffs
> -----
> 
> kdecore/sonnet/speller.cpp b19e74d 
> 
> Diff: http://git.reviewboard.kde.org/r/106242/diff/
> 
> 
> Testing
> -------
> 
> Compiled, installed, used for a week or so, spellchecked a bunch of things.
> 
> 
> Thanks,
> 
> Simeon Bird
> 
> 


[Attachment #3 (text/html)]

<html>
 <body>
  <div style="font-family: Verdana, Arial, Helvetica, Sans-Serif;">
   <table bgcolor="#f9f3c9" width="100%" cellpadding="8" style="border: 1px #c9c399 \
solid;">  <tr>
     <td>
      This is an automatically generated e-mail. To reply, visit:
      <a href="http://git.reviewboard.kde.org/r/106242/">http://git.reviewboard.kde.org/r/106242/</a>
  </td>
    </tr>
   </table>
   <br />





<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;">  <p style="margin-top: 0;">On September 6th, 2012, 12:29 p.m., <b>David \
Faure</b> wrote:</p>  <blockquote style="margin-left: 1em; border-left: 2px solid \
#d0d0d0; padding-left: 10px;">  <pre style="white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">A number of comments on the implementation, but also a more general \
comment: does it even make sense to use a Speller in multiple threads, and to  change \
the language from one thread while another one is using the speller for \
spell-checking? It sounds like, even if a mutex/lock can prevent crashes, this is a \
weird thing to do anyway, since you have no idea at which point the spell-checking \
will switch to another language... could happen in the very middle of a sentence...

Maybe it would make more sense to &quot;post&quot; the language change operation to \
the spell-checking thread using the same mechanism as the one used to \
&quot;post&quot; spellchecking requests to it? (Disclaimer: I know nothing of the \
krunner architecture).</pre>  </blockquote>




 <p>On September 9th, 2012, 10:05 p.m., <b>Simeon Bird</b> wrote:</p>
 <blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;">  <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">Thanks for the review. 

I agree that this is a slightly weird thing to do, but, as you surmise, it was the \
only way I could think of to make the feature work properly. 

As far as I understand it (from reading the documentation), the way krunner works is \
to call Plasma::AbstractRunner::match in a new thread for every runner every time \
input is entered. 

So if I enter:

&quot;spell hello&quot;

I will be called with &#39;s&#39; &#39;sp&#39; &#39;spe&#39;...and so on until I get \
the whole match, without waiting for the previous threads to return. Thus match has \
to be thread-safe.

The feature I&#39;m trying to fix here is to be able to enter &quot;spell french \
bonjour&quot; and have it spell something.  The runner is responsible for parsing the \
input and checking whether it should do anything else. So we don&#39;t actually post \
spellchecking requests, we just post some input, and detect that we should \
spell-check, and change the language, by parsing strings. So I couldn&#39;t see a way \
to do this except to call Sonnet::Speller::setLanguage() in \
Plasma::AbstractRunner::match,  which meant that for it to work I had to make match \
thread-safe. 

Sorry. I&#39;m open to suggestions if you have a better idea. 

I&#39;ve fixed your other comments in v2 of the diff, let me know if there is \
anything else.</pre>  </blockquote>





 <p>On September 10th, 2012, 11:06 a.m., <b>Thomas Lübking</b> wrote:</p>
 <blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;">  <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">would it not make more \
sense to just kill the pending (and now worthless?) thread instead (no idea how \
expensive this spellchecking is to be threaded) to not waste cpu on a result that \
will be ignored anyway?</pre>  </blockquote>





 <p>On September 10th, 2012, 11:16 a.m., <b>David Faure</b> wrote:</p>
 <blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;">  <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">Forcefully killing \
threads (QThread::terminate()) is a big no-no, it leaks memory, potentially worse: it \
can leave a mutex locked, so any further thread trying to lock it, will hang for \
ever.

The only way to terminate a thread is to ask it nicely to terminate from within (i.e. \
to exit run()), which means it has to finish what it&#39;s currently doing, first, \
before it looks up for the next command or posted event which tells it to quit.</pre> \
</blockquote>





 <p>On September 10th, 2012, 1:57 p.m., <b>Thomas Lübking</b> wrote:</p>
 <blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;">  <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">That is generally true \
and i admit to not know how the plasma spell checker is implemented (or even \
&quot;where&quot; ;-), but would expect some sort of ro non-joinable threading, \
operating on shared data (as the only purpose seems to prevent blocking the main \
event loop - not running two checks at the same  time and merge their results)  ... \
or does QThread alone bring structures to forbid such approach (The API doc does not \
mention inevitable leaks for QThread itself on ::terminate())?

(PS: I just start to wonder whether the speller threads are then tagged to catch 2,1 \
exits...)</pre>  </blockquote>





 <p>On September 10th, 2012, 11:55 p.m., <b>Simeon Bird</b> wrote:</p>
 <blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;">  <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">I don&#39;t entirely \
follow what you mean in the last comment; what do you mean &#39;tagged to catch 2,1 \
exits&#39;?

The spellchecking runner is, I think, very cheap compared to some of the other \
runners (eg, the nepomuk search), all of which are called at once.

But, be that as it may, the spell-check has no control over the threading model of \
krunner, which it just inherits from Plasma::AbstractRunner, and changing that model \
for everything seems a bit overkill for this, even if we understood fully how it \
works (I certainly don&#39;t; I just took the documentation at face value, which may \
or may not be a good idea).

A reasonable alternative, if you don&#39;t like doing the commit in principle (the \
mutex might well slow down the spell-check), is to just remove the problematic \
feature (which has never worked anyway). If krunner only spell-checked words in the \
default dictionary, we wouldn&#39;t need to call setLanguage(), and everything would \
be fine, as it was before 2010. </pre>  </blockquote>





 <p>On September 11th, 2012, 12:53 a.m., <b>Thomas Lübking</b> wrote:</p>
 <blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: \
10px;">  <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">Thread execution is not \
predictable so in general it&#39;s possible that you start a thread (1), then start \
another thread (2) and thread 2 finishes before thread 1 Since i assume that in this \
case the last exiting thread will determine the sanity of the token, this would eg. \
mean that if you spellcheck &quot;threa&quot; and then &quot;thread&quot; but \
&quot;threa&quot; finishes last, the token &quot;thread&quot; would be determined \
misspelled.

A common approach to handle this is to &quot;tag&quot; (basically with an increment) \
the threads/results so that if you get a result from thread &quot;1&quot; and already \
got a result from thread &quot;2&quot; before, you know that the result from thread \
&quot;1&quot; is already irrelevant and simply drop it instead of polluting your \
result.

I have no particular opinion on this patch or the krunner spell checker in general \
(except assuming that threading it will likely be overhead anyway - esp. if one \
sequentially navigated through a tree structure - and given the type rate a normal \
human being can achieve ;-)</pre>  </blockquote>








</blockquote>

<pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: \
-pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">Ah, I see. This is \
getting somewhat educational for me :)</pre> <br />








<p>- Simeon</p>


<br />
<p>On September 9th, 2012, 10:06 p.m., Simeon Bird wrote:</p>






<table bgcolor="#fefadf" width="100%" cellspacing="0" cellpadding="8" \
style="background-image: \
url('http://git.reviewboard.kde.org/media/rb/images/review_request_box_top_bg.png'); \
background-position: left top; background-repeat: repeat-x; border: 1px black \
solid;">  <tr>
  <td>

<div>Review request for kdelibs and Plasma.</div>
<div>By Simeon Bird.</div>


<p style="color: grey;"><i>Updated Sept. 9, 2012, 10:06 p.m.</i></p>






<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Description </h1>
 <table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" \
style="border: 1px solid #b8b5a0">  <tr>
  <td>
   <pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">Krunner&#39;s spellcheck plugin has been pretty broken since \
bd291d21f096a714a171e7af3a534ba345ca5659 (about two years ago) because it called \
Sonnet::Speller::setLanguage every time the spellchecker was invoked, which was (very \
much) not thread-safe.

This patch makes Sonnet::Speller::setLanguage threadsafe by protecting all access to \
the internal dict pointer using QReadWriteLock. 

A related review request is 106244, which adds more fixes to the spellcheck \
feature.</pre>  </td>
 </tr>
</table>


<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Testing </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" style="border: \
1px solid #b8b5a0">  <tr>
  <td>
   <pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">Compiled, installed, used for a week or so, spellchecked a bunch of \
things.</pre>  </td>
 </tr>
</table>



<div style="margin-top: 1.5em;">
 <b style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Bugs: </b>


 <a href="http://bugs.kde.org/show_bug.cgi?id=264779">264779</a>, 

 <a href="http://bugs.kde.org/show_bug.cgi?id=303831">303831</a>


</div>


<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Diffs</b> </h1>
<ul style="margin-left: 3em; padding-left: 0;">

 <li>kdecore/sonnet/speller.cpp <span style="color: grey">(b19e74d)</span></li>

</ul>

<p><a href="http://git.reviewboard.kde.org/r/106242/diff/" style="margin-left: \
3em;">View Diff</a></p>




  </td>
 </tr>
</table>








  </div>
 </body>
</html>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic