[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-core-devel
Subject:    Re: Review Request 114717: Language detection in Sonnet
From:       "Martin Tobias Holmedahl Sandsmark" <martin.sandsmark () kde ! org>
Date:       2014-01-08 19:24:50
Message-ID: 20140108192450.16659.17058 () probe ! kde ! org
[Download RAW message or body]

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://git.reviewboard.kde.org/r/114717/
-----------------------------------------------------------

(Updated Jan. 8, 2014, 7:24 p.m.)


Status
------

This change has been marked as submitted.


Review request for kdelibs and KDEPIM.


Repository: sonnet


Description
-------

I started by merging in the old language detection branch from SVN, while improving \
it as I went along. One improvement was to use QChar's unicode information instead of \
shipping our own unicode code point information tables. The old filter class also got \
replaced with a new tokenizer, which I rewrote most of to simplify.

I added kdepim to the reviewers because I remember talking with someone working on \
PIM stuff on IRC, and he was interested in this (a long time ago, though).


Diffs
-----

  data/trigrams/ja PRE-CREATION 
  data/trigrams/kk PRE-CREATION 
  data/trigrams/ko PRE-CREATION 
  data/trigrams/ky PRE-CREATION 
  data/trigrams/la PRE-CREATION 
  data/trigrams/lt PRE-CREATION 
  data/trigrams/lv PRE-CREATION 
  data/trigrams/mk PRE-CREATION 
  data/trigrams/mn PRE-CREATION 
  data/trigrams/nb PRE-CREATION 
  data/trigrams/ne PRE-CREATION 
  data/trigrams/nl PRE-CREATION 
  data/trigrams/nr PRE-CREATION 
  data/trigrams/pl PRE-CREATION 
  data/trigrams/ps PRE-CREATION 
  data/trigrams/pt PRE-CREATION 
  data/trigrams/pt_BR PRE-CREATION 
  data/trigrams/pt_PT PRE-CREATION 
  data/trigrams/ro PRE-CREATION 
  data/trigrams/ru PRE-CREATION 
  data/trigrams/sk PRE-CREATION 
  data/trigrams/sl PRE-CREATION 
  data/trigrams/so PRE-CREATION 
  data/trigrams/sq PRE-CREATION 
  data/trigrams/sr PRE-CREATION 
  data/trigrams/ss PRE-CREATION 
  data/trigrams/st PRE-CREATION 
  data/trigrams/sv PRE-CREATION 
  data/trigrams/sw PRE-CREATION 
  data/trigrams/th PRE-CREATION 
  data/trigrams/tl PRE-CREATION 
  data/trigrams/tn PRE-CREATION 
  data/trigrams/tr PRE-CREATION 
  data/trigrams/ts PRE-CREATION 
  data/trigrams/uk PRE-CREATION 
  data/trigrams/ur PRE-CREATION 
  data/trigrams/uz PRE-CREATION 
  data/trigrams/ve PRE-CREATION 
  data/trigrams/vi PRE-CREATION 
  data/trigrams/xh PRE-CREATION 
  data/trigrams/zu PRE-CREATION 
  sonnet.yaml c54f87b 
  src/CMakeLists.txt e79492f 
  src/core/CMakeLists.txt 2f8a184 
  src/core/backgroundchecker.cpp 8b9e983 
  src/core/backgroundchecker_p.h PRE-CREATION 
  src/core/backgroundengine.cpp 3a14d34 
  src/core/backgroundengine_p.h 10f6a27 
  src/core/client_p.h bd3e416 
  src/core/filter.cpp e99d332 
  src/core/filter_p.h 6c7d8c9 
  src/core/globals.h 0c54c96 
  src/core/globals.cpp e57450f 
  src/core/guesslanguage.h PRE-CREATION 
  src/core/guesslanguage.cpp PRE-CREATION 
  src/core/languagefilter.cpp PRE-CREATION 
  src/core/languagefilter_p.h PRE-CREATION 
  src/core/loader.cpp ee8db0e 
  src/core/settings.cpp 095eddb 
  src/core/settings_p.h ee2d22c 
  src/core/speller.h 7428339 
  src/core/speller.cpp 8cc2a1e 
  src/core/textbreaks.cpp PRE-CREATION 
  src/core/textbreaks_p.h PRE-CREATION 
  src/core/tokenizer.cpp PRE-CREATION 
  src/core/tokenizer_p.h PRE-CREATION 
  src/plugins/CMakeLists.txt fc33a97 
  src/plugins/aspell/kspell_aspellclient.h eadb52a 
  src/plugins/enchant/CMakeLists.txt 817db0c 
  src/plugins/enchant/enchantclient.h 25f62eb 
  src/plugins/hspell/CMakeLists.txt e128cb3 
  src/plugins/hspell/kspell_hspellclient.h 966303f 
  src/plugins/hunspell/CMakeLists.txt ccae7f7 
  src/plugins/hunspell/kspell_hunspellclient.h 79638bb 
  src/ui/configui.ui 6532552 
  src/ui/configwidget.cpp 7a5cc99 
  src/ui/dialog.cpp 13ad39d 
  src/ui/highlighter.h 46418b9 
  src/ui/highlighter.cpp 9f31268 
  src/unicode/CMakeLists.txt 1be0a54 
  src/unicode/README f9b8030 
  src/unicode/data/GraphemeBreakProperty.txt 8805f36 
  src/unicode/data/SentenceBreakProperty.txt fc58820 
  src/unicode/data/WordBreakProperty.txt 78c531c 
  src/unicode/parseucd/parseucd.cpp a050140 
  tests/test_dialog.cpp 0579bb2 
  tests/test_highlighter.h 9cf5657 
  tests/test_highlighter.cpp 695a2df 
  tests/test_textedit.cpp 5c02809 
  data/trigrams/fr PRE-CREATION 
  data/trigrams/ha PRE-CREATION 
  data/trigrams/hi PRE-CREATION 
  data/trigrams/hr PRE-CREATION 
  data/trigrams/hu PRE-CREATION 
  data/trigrams/id PRE-CREATION 
  data/trigrams/is PRE-CREATION 
  data/trigrams/it PRE-CREATION 
  data/parsetrigrams.cpp PRE-CREATION 
  data/trigrams/af PRE-CREATION 
  data/trigrams/ar PRE-CREATION 
  data/trigrams/az PRE-CREATION 
  data/trigrams/bg PRE-CREATION 
  data/trigrams/ca PRE-CREATION 
  data/trigrams/cs PRE-CREATION 
  data/trigrams/cy PRE-CREATION 
  data/trigrams/da PRE-CREATION 
  data/trigrams/de PRE-CREATION 
  data/trigrams/en PRE-CREATION 
  data/trigrams/es PRE-CREATION 
  data/trigrams/et PRE-CREATION 
  data/trigrams/eu PRE-CREATION 
  data/trigrams/fa PRE-CREATION 
  data/trigrams/fi PRE-CREATION 
  CMakeLists.txt 1fdcf1e 
  README.md 63e2c6a 
  autotests/CMakeLists.txt e9fc573 
  data/CMakeLists.txt PRE-CREATION 

Diff: https://git.reviewboard.kde.org/r/114717/diff/


Testing
-------

mostly using test_highlighter.


Thanks,

Martin Tobias Holmedahl Sandsmark


[Attachment #3 (text/html)]

<html>
 <body>
  <div style="font-family: Verdana, Arial, Helvetica, Sans-Serif;">
   <table bgcolor="#f9f3c9" width="100%" cellpadding="8" style="border: 1px #c9c399 \
solid;">  <tr>
     <td>
      This is an automatically generated e-mail. To reply, visit:
      <a href="https://git.reviewboard.kde.org/r/114717/">https://git.reviewboard.kde.org/r/114717/</a>
  </td>
    </tr>
   </table>
   <br />



<table bgcolor="#e0e0e0" width="100%" cellpadding="8" style="border: 1px gray \
solid;">  <tr>
  <td>
   <h1 style="margin-right: 0.2em; padding: 0; font-size: 10pt;">This change has been \
marked as submitted.</h1>  </td>
 </tr>
</table>
<br />


<table bgcolor="#fefadf" width="100%" cellspacing="0" cellpadding="8" \
style="background-image: \
url('https://git.reviewboard.kde.org/static/rb/images/review_request_box_top_bg.ab6f3b1072c9.png'); \
background-position: left top; background-repeat: repeat-x; border: 1px black \
solid;">  <tr>
  <td>

<div>Review request for kdelibs and KDEPIM.</div>
<div>By Martin Tobias Holmedahl Sandsmark.</div>


<p style="color: grey;"><i>Updated Jan. 8, 2014, 7:24 p.m.</i></p>









<div style="margin-top: 1.5em;">
 <b style="color: #575012; font-size: 10pt;">Repository: </b>
sonnet
</div>


<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Description </h1>
 <table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" \
style="border: 1px solid #b8b5a0">  <tr>
  <td>
   <pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">I started by merging in the old language detection branch from SVN, \
while improving it as I went along. One improvement was to use QChar&#39;s unicode \
information instead of shipping our own unicode code point information tables. The \
old filter class also got replaced with a new tokenizer, which I rewrote most of to \
simplify.

I added kdepim to the reviewers because I remember talking with someone working on \
PIM stuff on IRC, and he was interested in this (a long time ago, though).</pre>  \
</td>  </tr>
</table>


<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Testing </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" style="border: \
1px solid #b8b5a0">  <tr>
  <td>
   <pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">mostly using test_highlighter.</pre>  </td>
 </tr>
</table>


<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Diffs</b> </h1>
<ul style="margin-left: 3em; padding-left: 0;">

 <li>data/trigrams/ja <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/kk <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/ko <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/ky <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/la <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/lt <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/lv <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/mk <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/mn <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/nb <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/ne <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/nl <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/nr <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/pl <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/ps <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/pt <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/pt_BR <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/pt_PT <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/ro <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/ru <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/sk <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/sl <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/so <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/sq <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/sr <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/ss <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/st <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/sv <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/sw <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/th <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/tl <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/tn <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/tr <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/ts <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/uk <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/ur <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/uz <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/ve <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/vi <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/xh <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/zu <span style="color: grey">(PRE-CREATION)</span></li>

 <li>sonnet.yaml <span style="color: grey">(c54f87b)</span></li>

 <li>src/CMakeLists.txt <span style="color: grey">(e79492f)</span></li>

 <li>src/core/CMakeLists.txt <span style="color: grey">(2f8a184)</span></li>

 <li>src/core/backgroundchecker.cpp <span style="color: grey">(8b9e983)</span></li>

 <li>src/core/backgroundchecker_p.h <span style="color: \
grey">(PRE-CREATION)</span></li>

 <li>src/core/backgroundengine.cpp <span style="color: grey">(3a14d34)</span></li>

 <li>src/core/backgroundengine_p.h <span style="color: grey">(10f6a27)</span></li>

 <li>src/core/client_p.h <span style="color: grey">(bd3e416)</span></li>

 <li>src/core/filter.cpp <span style="color: grey">(e99d332)</span></li>

 <li>src/core/filter_p.h <span style="color: grey">(6c7d8c9)</span></li>

 <li>src/core/globals.h <span style="color: grey">(0c54c96)</span></li>

 <li>src/core/globals.cpp <span style="color: grey">(e57450f)</span></li>

 <li>src/core/guesslanguage.h <span style="color: grey">(PRE-CREATION)</span></li>

 <li>src/core/guesslanguage.cpp <span style="color: grey">(PRE-CREATION)</span></li>

 <li>src/core/languagefilter.cpp <span style="color: grey">(PRE-CREATION)</span></li>

 <li>src/core/languagefilter_p.h <span style="color: grey">(PRE-CREATION)</span></li>

 <li>src/core/loader.cpp <span style="color: grey">(ee8db0e)</span></li>

 <li>src/core/settings.cpp <span style="color: grey">(095eddb)</span></li>

 <li>src/core/settings_p.h <span style="color: grey">(ee2d22c)</span></li>

 <li>src/core/speller.h <span style="color: grey">(7428339)</span></li>

 <li>src/core/speller.cpp <span style="color: grey">(8cc2a1e)</span></li>

 <li>src/core/textbreaks.cpp <span style="color: grey">(PRE-CREATION)</span></li>

 <li>src/core/textbreaks_p.h <span style="color: grey">(PRE-CREATION)</span></li>

 <li>src/core/tokenizer.cpp <span style="color: grey">(PRE-CREATION)</span></li>

 <li>src/core/tokenizer_p.h <span style="color: grey">(PRE-CREATION)</span></li>

 <li>src/plugins/CMakeLists.txt <span style="color: grey">(fc33a97)</span></li>

 <li>src/plugins/aspell/kspell_aspellclient.h <span style="color: \
grey">(eadb52a)</span></li>

 <li>src/plugins/enchant/CMakeLists.txt <span style="color: \
grey">(817db0c)</span></li>

 <li>src/plugins/enchant/enchantclient.h <span style="color: \
grey">(25f62eb)</span></li>

 <li>src/plugins/hspell/CMakeLists.txt <span style="color: \
grey">(e128cb3)</span></li>

 <li>src/plugins/hspell/kspell_hspellclient.h <span style="color: \
grey">(966303f)</span></li>

 <li>src/plugins/hunspell/CMakeLists.txt <span style="color: \
grey">(ccae7f7)</span></li>

 <li>src/plugins/hunspell/kspell_hunspellclient.h <span style="color: \
grey">(79638bb)</span></li>

 <li>src/ui/configui.ui <span style="color: grey">(6532552)</span></li>

 <li>src/ui/configwidget.cpp <span style="color: grey">(7a5cc99)</span></li>

 <li>src/ui/dialog.cpp <span style="color: grey">(13ad39d)</span></li>

 <li>src/ui/highlighter.h <span style="color: grey">(46418b9)</span></li>

 <li>src/ui/highlighter.cpp <span style="color: grey">(9f31268)</span></li>

 <li>src/unicode/CMakeLists.txt <span style="color: grey">(1be0a54)</span></li>

 <li>src/unicode/README <span style="color: grey">(f9b8030)</span></li>

 <li>src/unicode/data/GraphemeBreakProperty.txt <span style="color: \
grey">(8805f36)</span></li>

 <li>src/unicode/data/SentenceBreakProperty.txt <span style="color: \
grey">(fc58820)</span></li>

 <li>src/unicode/data/WordBreakProperty.txt <span style="color: \
grey">(78c531c)</span></li>

 <li>src/unicode/parseucd/parseucd.cpp <span style="color: \
grey">(a050140)</span></li>

 <li>tests/test_dialog.cpp <span style="color: grey">(0579bb2)</span></li>

 <li>tests/test_highlighter.h <span style="color: grey">(9cf5657)</span></li>

 <li>tests/test_highlighter.cpp <span style="color: grey">(695a2df)</span></li>

 <li>tests/test_textedit.cpp <span style="color: grey">(5c02809)</span></li>

 <li>data/trigrams/fr <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/ha <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/hi <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/hr <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/hu <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/id <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/is <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/it <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/parsetrigrams.cpp <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/af <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/ar <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/az <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/bg <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/ca <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/cs <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/cy <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/da <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/de <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/en <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/es <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/et <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/eu <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/fa <span style="color: grey">(PRE-CREATION)</span></li>

 <li>data/trigrams/fi <span style="color: grey">(PRE-CREATION)</span></li>

 <li>CMakeLists.txt <span style="color: grey">(1fdcf1e)</span></li>

 <li>README.md <span style="color: grey">(63e2c6a)</span></li>

 <li>autotests/CMakeLists.txt <span style="color: grey">(e9fc573)</span></li>

 <li>data/CMakeLists.txt <span style="color: grey">(PRE-CREATION)</span></li>

</ul>

<p><a href="https://git.reviewboard.kde.org/r/114717/diff/" style="margin-left: \
3em;">View Diff</a></p>







  </td>
 </tr>
</table>




  </div>
 </body>
</html>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic