[prev in list] [next in list] [prev in thread] [next in thread]
List: kde-core-devel
Subject: Re: Review Request 114717: Language detection in Sonnet
From: "Martin Tobias Holmedahl Sandsmark" <martin.sandsmark () kde ! org>
Date: 2014-01-08 19:24:50
Message-ID: 20140108192450.16659.17058 () probe ! kde ! org
[Download RAW message or body]
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://git.reviewboard.kde.org/r/114717/
-----------------------------------------------------------
(Updated Jan. 8, 2014, 7:24 p.m.)
Status
------
This change has been marked as submitted.
Review request for kdelibs and KDEPIM.
Repository: sonnet
Description
-------
I started by merging in the old language detection branch from SVN, while improving \
it as I went along. One improvement was to use QChar's unicode information instead of \
shipping our own unicode code point information tables. The old filter class also got \
replaced with a new tokenizer, which I rewrote most of to simplify.
I added kdepim to the reviewers because I remember talking with someone working on \
PIM stuff on IRC, and he was interested in this (a long time ago, though).
Diffs
-----
data/trigrams/ja PRE-CREATION
data/trigrams/kk PRE-CREATION
data/trigrams/ko PRE-CREATION
data/trigrams/ky PRE-CREATION
data/trigrams/la PRE-CREATION
data/trigrams/lt PRE-CREATION
data/trigrams/lv PRE-CREATION
data/trigrams/mk PRE-CREATION
data/trigrams/mn PRE-CREATION
data/trigrams/nb PRE-CREATION
data/trigrams/ne PRE-CREATION
data/trigrams/nl PRE-CREATION
data/trigrams/nr PRE-CREATION
data/trigrams/pl PRE-CREATION
data/trigrams/ps PRE-CREATION
data/trigrams/pt PRE-CREATION
data/trigrams/pt_BR PRE-CREATION
data/trigrams/pt_PT PRE-CREATION
data/trigrams/ro PRE-CREATION
data/trigrams/ru PRE-CREATION
data/trigrams/sk PRE-CREATION
data/trigrams/sl PRE-CREATION
data/trigrams/so PRE-CREATION
data/trigrams/sq PRE-CREATION
data/trigrams/sr PRE-CREATION
data/trigrams/ss PRE-CREATION
data/trigrams/st PRE-CREATION
data/trigrams/sv PRE-CREATION
data/trigrams/sw PRE-CREATION
data/trigrams/th PRE-CREATION
data/trigrams/tl PRE-CREATION
data/trigrams/tn PRE-CREATION
data/trigrams/tr PRE-CREATION
data/trigrams/ts PRE-CREATION
data/trigrams/uk PRE-CREATION
data/trigrams/ur PRE-CREATION
data/trigrams/uz PRE-CREATION
data/trigrams/ve PRE-CREATION
data/trigrams/vi PRE-CREATION
data/trigrams/xh PRE-CREATION
data/trigrams/zu PRE-CREATION
sonnet.yaml c54f87b
src/CMakeLists.txt e79492f
src/core/CMakeLists.txt 2f8a184
src/core/backgroundchecker.cpp 8b9e983
src/core/backgroundchecker_p.h PRE-CREATION
src/core/backgroundengine.cpp 3a14d34
src/core/backgroundengine_p.h 10f6a27
src/core/client_p.h bd3e416
src/core/filter.cpp e99d332
src/core/filter_p.h 6c7d8c9
src/core/globals.h 0c54c96
src/core/globals.cpp e57450f
src/core/guesslanguage.h PRE-CREATION
src/core/guesslanguage.cpp PRE-CREATION
src/core/languagefilter.cpp PRE-CREATION
src/core/languagefilter_p.h PRE-CREATION
src/core/loader.cpp ee8db0e
src/core/settings.cpp 095eddb
src/core/settings_p.h ee2d22c
src/core/speller.h 7428339
src/core/speller.cpp 8cc2a1e
src/core/textbreaks.cpp PRE-CREATION
src/core/textbreaks_p.h PRE-CREATION
src/core/tokenizer.cpp PRE-CREATION
src/core/tokenizer_p.h PRE-CREATION
src/plugins/CMakeLists.txt fc33a97
src/plugins/aspell/kspell_aspellclient.h eadb52a
src/plugins/enchant/CMakeLists.txt 817db0c
src/plugins/enchant/enchantclient.h 25f62eb
src/plugins/hspell/CMakeLists.txt e128cb3
src/plugins/hspell/kspell_hspellclient.h 966303f
src/plugins/hunspell/CMakeLists.txt ccae7f7
src/plugins/hunspell/kspell_hunspellclient.h 79638bb
src/ui/configui.ui 6532552
src/ui/configwidget.cpp 7a5cc99
src/ui/dialog.cpp 13ad39d
src/ui/highlighter.h 46418b9
src/ui/highlighter.cpp 9f31268
src/unicode/CMakeLists.txt 1be0a54
src/unicode/README f9b8030
src/unicode/data/GraphemeBreakProperty.txt 8805f36
src/unicode/data/SentenceBreakProperty.txt fc58820
src/unicode/data/WordBreakProperty.txt 78c531c
src/unicode/parseucd/parseucd.cpp a050140
tests/test_dialog.cpp 0579bb2
tests/test_highlighter.h 9cf5657
tests/test_highlighter.cpp 695a2df
tests/test_textedit.cpp 5c02809
data/trigrams/fr PRE-CREATION
data/trigrams/ha PRE-CREATION
data/trigrams/hi PRE-CREATION
data/trigrams/hr PRE-CREATION
data/trigrams/hu PRE-CREATION
data/trigrams/id PRE-CREATION
data/trigrams/is PRE-CREATION
data/trigrams/it PRE-CREATION
data/parsetrigrams.cpp PRE-CREATION
data/trigrams/af PRE-CREATION
data/trigrams/ar PRE-CREATION
data/trigrams/az PRE-CREATION
data/trigrams/bg PRE-CREATION
data/trigrams/ca PRE-CREATION
data/trigrams/cs PRE-CREATION
data/trigrams/cy PRE-CREATION
data/trigrams/da PRE-CREATION
data/trigrams/de PRE-CREATION
data/trigrams/en PRE-CREATION
data/trigrams/es PRE-CREATION
data/trigrams/et PRE-CREATION
data/trigrams/eu PRE-CREATION
data/trigrams/fa PRE-CREATION
data/trigrams/fi PRE-CREATION
CMakeLists.txt 1fdcf1e
README.md 63e2c6a
autotests/CMakeLists.txt e9fc573
data/CMakeLists.txt PRE-CREATION
Diff: https://git.reviewboard.kde.org/r/114717/diff/
Testing
-------
mostly using test_highlighter.
Thanks,
Martin Tobias Holmedahl Sandsmark
[Attachment #3 (text/html)]
<html>
<body>
<div style="font-family: Verdana, Arial, Helvetica, Sans-Serif;">
<table bgcolor="#f9f3c9" width="100%" cellpadding="8" style="border: 1px #c9c399 \
solid;"> <tr>
<td>
This is an automatically generated e-mail. To reply, visit:
<a href="https://git.reviewboard.kde.org/r/114717/">https://git.reviewboard.kde.org/r/114717/</a>
</td>
</tr>
</table>
<br />
<table bgcolor="#e0e0e0" width="100%" cellpadding="8" style="border: 1px gray \
solid;"> <tr>
<td>
<h1 style="margin-right: 0.2em; padding: 0; font-size: 10pt;">This change has been \
marked as submitted.</h1> </td>
</tr>
</table>
<br />
<table bgcolor="#fefadf" width="100%" cellspacing="0" cellpadding="8" \
style="background-image: \
url('https://git.reviewboard.kde.org/static/rb/images/review_request_box_top_bg.ab6f3b1072c9.png'); \
background-position: left top; background-repeat: repeat-x; border: 1px black \
solid;"> <tr>
<td>
<div>Review request for kdelibs and KDEPIM.</div>
<div>By Martin Tobias Holmedahl Sandsmark.</div>
<p style="color: grey;"><i>Updated Jan. 8, 2014, 7:24 p.m.</i></p>
<div style="margin-top: 1.5em;">
<b style="color: #575012; font-size: 10pt;">Repository: </b>
sonnet
</div>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Description </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" \
style="border: 1px solid #b8b5a0"> <tr>
<td>
<pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">I started by merging in the old language detection branch from SVN, \
while improving it as I went along. One improvement was to use QChar's unicode \
information instead of shipping our own unicode code point information tables. The \
old filter class also got replaced with a new tokenizer, which I rewrote most of to \
simplify.
I added kdepim to the reviewers because I remember talking with someone working on \
PIM stuff on IRC, and he was interested in this (a long time ago, though).</pre> \
</td> </tr>
</table>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Testing </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" style="border: \
1px solid #b8b5a0"> <tr>
<td>
<pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: \
-moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: \
break-word;">mostly using test_highlighter.</pre> </td>
</tr>
</table>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Diffs</b> </h1>
<ul style="margin-left: 3em; padding-left: 0;">
<li>data/trigrams/ja <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/kk <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/ko <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/ky <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/la <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/lt <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/lv <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/mk <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/mn <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/nb <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/ne <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/nl <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/nr <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/pl <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/ps <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/pt <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/pt_BR <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/pt_PT <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/ro <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/ru <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/sk <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/sl <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/so <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/sq <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/sr <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/ss <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/st <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/sv <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/sw <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/th <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/tl <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/tn <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/tr <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/ts <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/uk <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/ur <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/uz <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/ve <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/vi <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/xh <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/zu <span style="color: grey">(PRE-CREATION)</span></li>
<li>sonnet.yaml <span style="color: grey">(c54f87b)</span></li>
<li>src/CMakeLists.txt <span style="color: grey">(e79492f)</span></li>
<li>src/core/CMakeLists.txt <span style="color: grey">(2f8a184)</span></li>
<li>src/core/backgroundchecker.cpp <span style="color: grey">(8b9e983)</span></li>
<li>src/core/backgroundchecker_p.h <span style="color: \
grey">(PRE-CREATION)</span></li>
<li>src/core/backgroundengine.cpp <span style="color: grey">(3a14d34)</span></li>
<li>src/core/backgroundengine_p.h <span style="color: grey">(10f6a27)</span></li>
<li>src/core/client_p.h <span style="color: grey">(bd3e416)</span></li>
<li>src/core/filter.cpp <span style="color: grey">(e99d332)</span></li>
<li>src/core/filter_p.h <span style="color: grey">(6c7d8c9)</span></li>
<li>src/core/globals.h <span style="color: grey">(0c54c96)</span></li>
<li>src/core/globals.cpp <span style="color: grey">(e57450f)</span></li>
<li>src/core/guesslanguage.h <span style="color: grey">(PRE-CREATION)</span></li>
<li>src/core/guesslanguage.cpp <span style="color: grey">(PRE-CREATION)</span></li>
<li>src/core/languagefilter.cpp <span style="color: grey">(PRE-CREATION)</span></li>
<li>src/core/languagefilter_p.h <span style="color: grey">(PRE-CREATION)</span></li>
<li>src/core/loader.cpp <span style="color: grey">(ee8db0e)</span></li>
<li>src/core/settings.cpp <span style="color: grey">(095eddb)</span></li>
<li>src/core/settings_p.h <span style="color: grey">(ee2d22c)</span></li>
<li>src/core/speller.h <span style="color: grey">(7428339)</span></li>
<li>src/core/speller.cpp <span style="color: grey">(8cc2a1e)</span></li>
<li>src/core/textbreaks.cpp <span style="color: grey">(PRE-CREATION)</span></li>
<li>src/core/textbreaks_p.h <span style="color: grey">(PRE-CREATION)</span></li>
<li>src/core/tokenizer.cpp <span style="color: grey">(PRE-CREATION)</span></li>
<li>src/core/tokenizer_p.h <span style="color: grey">(PRE-CREATION)</span></li>
<li>src/plugins/CMakeLists.txt <span style="color: grey">(fc33a97)</span></li>
<li>src/plugins/aspell/kspell_aspellclient.h <span style="color: \
grey">(eadb52a)</span></li>
<li>src/plugins/enchant/CMakeLists.txt <span style="color: \
grey">(817db0c)</span></li>
<li>src/plugins/enchant/enchantclient.h <span style="color: \
grey">(25f62eb)</span></li>
<li>src/plugins/hspell/CMakeLists.txt <span style="color: \
grey">(e128cb3)</span></li>
<li>src/plugins/hspell/kspell_hspellclient.h <span style="color: \
grey">(966303f)</span></li>
<li>src/plugins/hunspell/CMakeLists.txt <span style="color: \
grey">(ccae7f7)</span></li>
<li>src/plugins/hunspell/kspell_hunspellclient.h <span style="color: \
grey">(79638bb)</span></li>
<li>src/ui/configui.ui <span style="color: grey">(6532552)</span></li>
<li>src/ui/configwidget.cpp <span style="color: grey">(7a5cc99)</span></li>
<li>src/ui/dialog.cpp <span style="color: grey">(13ad39d)</span></li>
<li>src/ui/highlighter.h <span style="color: grey">(46418b9)</span></li>
<li>src/ui/highlighter.cpp <span style="color: grey">(9f31268)</span></li>
<li>src/unicode/CMakeLists.txt <span style="color: grey">(1be0a54)</span></li>
<li>src/unicode/README <span style="color: grey">(f9b8030)</span></li>
<li>src/unicode/data/GraphemeBreakProperty.txt <span style="color: \
grey">(8805f36)</span></li>
<li>src/unicode/data/SentenceBreakProperty.txt <span style="color: \
grey">(fc58820)</span></li>
<li>src/unicode/data/WordBreakProperty.txt <span style="color: \
grey">(78c531c)</span></li>
<li>src/unicode/parseucd/parseucd.cpp <span style="color: \
grey">(a050140)</span></li>
<li>tests/test_dialog.cpp <span style="color: grey">(0579bb2)</span></li>
<li>tests/test_highlighter.h <span style="color: grey">(9cf5657)</span></li>
<li>tests/test_highlighter.cpp <span style="color: grey">(695a2df)</span></li>
<li>tests/test_textedit.cpp <span style="color: grey">(5c02809)</span></li>
<li>data/trigrams/fr <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/ha <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/hi <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/hr <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/hu <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/id <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/is <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/it <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/parsetrigrams.cpp <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/af <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/ar <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/az <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/bg <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/ca <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/cs <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/cy <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/da <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/de <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/en <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/es <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/et <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/eu <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/fa <span style="color: grey">(PRE-CREATION)</span></li>
<li>data/trigrams/fi <span style="color: grey">(PRE-CREATION)</span></li>
<li>CMakeLists.txt <span style="color: grey">(1fdcf1e)</span></li>
<li>README.md <span style="color: grey">(63e2c6a)</span></li>
<li>autotests/CMakeLists.txt <span style="color: grey">(e9fc573)</span></li>
<li>data/CMakeLists.txt <span style="color: grey">(PRE-CREATION)</span></li>
</ul>
<p><a href="https://git.reviewboard.kde.org/r/114717/diff/" style="margin-left: \
3em;">View Diff</a></p>
</td>
</tr>
</table>
</div>
</body>
</html>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic