[prev in list] [next in list] [prev in thread] [next in thread]
List: kde-frameworks-devel
Subject: Re: Scrap Baloo Thread Feedback
From: Boudhayan Gupta <bgupta () kde ! org>
Date: 2016-10-17 3:27:00
Message-ID: CAKDS=N=zxdcaBLz4W6WWFPEVq8efjdjLe4=P=wqJ9rOqhj7c1A () mail ! gmail ! com
[Download RAW message or body]
Hi,
Unfortunately I've been hit my multiple pretty severe health scares in the
last month, and have no idea when I'm going to be at 100% again.
For the time being I'd rather not hold up any development, so don't hold
back anything on my account.
-- Boudhayan
On 16 October 2016 at 17:46, Christoph Cullmann <cullmann@absint.com> wrote:
> Hi,
>
> (evil top posting)
>
> given the silence, I assume any interest in baloo has stopped once more,
> or?
> Or are there any plans how to fixup the current situation?
>
> Greetings
> Christoph
>
> ----- Am 7. Okt 2016 um 20:08 schrieb cullmann cullmann@absint.com:
>
> > Hi,
> >
> >> Hey
> >>
> >> On Fri, Oct 7, 2016 at 6:34 PM, Christoph Cullmann <cullmann@absint.com>
> wrote:
> >>> Hi,
> >>>
> >>>> On Fri, Oct 7, 2016 at 5:58 PM, Christoph Cullmann <
> cullmann@absint.com> wrote:
> >>>>>
> >>>
> >>> 1) No handling of DB errors beside asserting
> >>> 2) No handling of errors in the extractors (e.g. see the fixes I did,
> all
> >>> extractors will need more of that)
> >>> 3) No handling of NFS/large inodes/inconsistencies => crash
> >>>
> >>> In the end, in my opinion, you can rewrite close to all parts dealing
> with the
> >>> DB or
> >>> any other thing internally. If ever any thing gots inconsistent, ATM
> you are
> >>> doomed, forever,
> >>> if not by luck my new startup code deletes the index, then you live
> again until
> >>> it is reindexed.
> >>>
> >>>>
> >>> I am not sure, I am all for removing complete indexing and use a other
> indexer
> >>> like tracker to exactly avoid the excurse into DB world and how to
> handle it
> >>> in a safe way with close to zero person manpower.
> >>>
> >>
> >> It's avoiding the problem and hoping for the best, without any
> experiments.
> > That is not true.
> >
> > I did experiments and search works with tracker, but yes, a problem is
> tagging,+
> > which ATM doesn't work. Nor do I say that is a ready solution now, just a
> > possibility
> > to avoid having to maintain low level code with at most 1 person (how it
> looks
> > ATM).
> >
> > And I don't propose to go that road now, but ATM I see nobody doing any
> other
> > experiments.
> >
> > Besides, tracker is constantly maintained and used since >> 5 years:
> >
> > https://github.com/GNOME/tracker/graphs/contributors
> >
> >>
> >>>
> >>> => That is good that we agree, but I find it very astonishing that we
> use baloo
> >>> in its
> >>> current state more or less mandatory on all that systems were it by
> design will
> >>> fail.
> >>>
> >>> (and it fails if you read the bugs)
> >>>
> >>
> >> There is a certain amount of failure, but it's not "by-design". But
> >> maybe I'm not seeing things clearly.
> > You yourself stated that neither 32-bit issues nor NFS nor > 32-bit
> inodes have
> > any
> > error handling. And that seems to have been known even during design and
> still
> > we have this now as a framework per default used by any Plasma
> installation on
> > systems exactly featuring that without error checking.
> >
> >>
> >>>>
> >>>>>>
> >>>>>> How about requirements such as resource consumption, ease of
> >>>>>> integration, search speed are taken into consideration? Come on
> guys.
> >>>>>> We're engineers over here.
> >>
> >>>>> What is the argument here? If you take a look at bugs.kde.org, you
> see that
> >>>>> people are complaining about all
> >>>>> of that with baloo. I see no evidence nowhere that e.g. baloo is
> "superior" to
> >>>>> what GNOME uses
> >>>>> or any other solution (perhaps beside nepomuk, ok...).
> >>
> >> What tests have been to obtain the evidence?
> > What tests have been done to obtain the inverse evidence? I only hear
> here the
> > complaint
> > about not taking requirements like resource consumption or speed into
> account,
> > but
> > there is ATM zero evidence that e.g. tracker is slower.
> >
> > And yes, there are "it hogs" 100% memory or time bugs open, thought you
> can
> > hardly reproduce them
> > as people are somehow scared to pack their home and send it to us. Not
> that a
> > lot of that bugs
> > got touched at all in Bugzilla.
> >
> >>
> >>>
> >>>>
> >>>> Yup, you have. It's awesome. I no longer have the motivation to work
> on Baloo.
> >>> Thanks, but that makes me very sad, btw.
> >>> Baloo came up to replace nepomuk, which was dead because it had too
> many bugs
> >>> and all maintainers left.
> >>> Now we have baloo, which has many bugs, some even by design, and the
> maintainer
> >>> left, too.
> >>>
> >>
> >> Actually, Nepomuk was not dead. I was maintaining it. I killed it
> >> because it had too many structural problems.
> >>
> >> This is how the open source world works. People work on projects and
> >> when it no longer scratches their itch (I no longer use Baloo), they
> >> loose interest. This is "supposed" to be a hobby.
> > That is ok, to see it as hobby.
> >
> > But I am a bit unnerved that one proposes this as the generic index
> solution
> > for our desktop, which should be stable, if nothing else, and knows that
> it has
> > severe
> > limitations that are not handled (see above). I would have assumed that
> at least
> > the known "can't work here'
> > cases are handled in a graceful way.
> >
> > And given already one of the first things main.cpp of baloo_file does is:
> >
> > // HACK: Untill we start using lmdb with robust mutex support. We're
> just going
> > to remove
> > // the lock manually in the baloo_file process.
> > QFile::remove(path + "/index-lock");
> >
> > that doesn't leave high hopes, sorry.
> >
> > And the typical error check is:
> >
> > void MTimeDB::put(quint32 mtime, quint64 docId)
> > {
> > Q_ASSERT(mtime > 0);
> > Q_ASSERT(docId > 0);
> >
> > MDB_val key;
> > key.mv_size = sizeof(quint32);
> > key.mv_data = static_cast<void*>(&mtime);
> >
> > MDB_val val;
> > val.mv_size = sizeof(quint64);
> > val.mv_data = static_cast<void*>(&docId);
> >
> > int rc = mdb_put(m_txn, m_dbi, &key, &val, 0);
> > Q_ASSERT_X(rc == 0, "MTimeDB::put", mdb_strerror(rc));
> > }
> >
> > without any way to pass an error to the outside, nor any error handling
> code at
> > the outside,
> > as no error can ever occur that is non-fatal.
> >
> >>
> >>>
> >>>> (This is why they run on a separate process)
> >>> That doesn't help, it just OOMs your system => dead, it needs resource
> >>> restrictions,
> >>> which is tricky to get right.
> >>>
> >>
> >> You're right. It needs a better thought out solution. A separate
> >> process is the bare minimum.
> >>
> >> Btw, have you looked if Tracker actually does any of this?
> > It has process separation and it handles crashs well enough to not screw
> up
> > client process queries. And it has maintained extractors or miners,
> unlike us.
> > But for sure, it has bugs and crashs and all things, but it is
> maintained and
> > has a
> > constant stream of fixes for a longer time than baloo + all predecessors
> > together.
> >
> >>
> >>>> My hostility was because the proposal ignores key points such as -
> >>>>
> >>>> * Indexing Speed
> >>>> * Search speed
> >>>> * Database size
> >>> => If you look at the bugs, people complain we are inferior and I see
> not
> >>> that the proposal ignores it, I just see not how to compare, given
> there are no
> >>> hard facts that we are faster than e.g. tracker in any way.
> >>>
> >>
> >> Data can be gathered about it. Not all data is publicly available.
> > That would make any decision easier to take.
> >
> >>
> >>>> * Ease of use with our existing components
> >>> My proposal did not change the interface at all, it has zero impact on
> "ease of
> >>> use".
> >>>
> >>>> * Ease of fixing problems in the code
> >>> My estimate would be: rewrite close to everything. Even the basic
> 64-bit int id
> >>> won't work
> >>> with 64-bit inodes, each DB call must be touched to check for errors,
> at each
> >>> place
> >>> one will need to check for potential inconsistencies and exit
> gracefully...
> >>>
> >>
> >> I don't follow why everything needs to be re-written? Am I missing
> >> something or do we just need to check for more errors and use a higher
> >> integer id? This certainly doesn't seem super trivial, but it sounds
> >> like less work than implementing a shim on top of Tracker.
> > If you look at your own code, you will see, that there is no error
> handling at
> > all,
> > beside asserts. (see above)
> >
> > There is not even the concept of pass an error out to higher levels.
> >
> > Perhaps I am wrong, because there is only a bit of documentation in
> addition,
> > but if you start to add error handling at the DB calls, you can start to
> rewrite
> > all internal layers.
> >
> > Besides I don't see any documentation of the DB format, but I could miss
> that.
> > (at least not in the git nor https://community.kde.org/Baloo)
> >
> >>
> >> I could be wrong.
> > So coulbe be me ;=)
> >
> >>
> >>>>
> >>>> Baloo has certain speed requirements if it is to be used with krunner,
> >>>> and we want instant feedback. This was an integral requirement.
> >>> I doubt e.g. tracker has different requirements, as it is used in
> similar places
> >>> by GNOME.
> >>>
> >>> But all that left besides, have you an proposal how to fixup the
> current
> >>> situation?
> >>> Are you willing to invest some work to fix the current issues or an
> idea what
> >>> would be a good way to tackle them?
> >>>
> >>
> >> I probably will not work more in Baloo.
> >>
> >> I'll have to investigate the problems a bit more. From the cursory
> >> look of this thread, it doesn't seem that the problems are that dire.
> >> But I may not be reading into it correctly.
> > What would be highly appreciated would be a bit of documentation what the
> > different pieces do and stuff like that, even if you have no time to
> code.
> >
> > Greetings
> > Christoph
> >
> > --
> > ----------------------------- Dr.-Ing. Christoph Cullmann ---------
> > AbsInt Angewandte Informatik GmbH Email: cullmann@AbsInt.com
> > Science Park 1 Tel: +49-681-38360-22
> > 66123 Saarbrücken Fax: +49-681-38360-20
> > GERMANY WWW: http://www.AbsInt.com
> > --------------------------------------------------------------------
> > Geschäftsführung: Dr.-Ing. Christian Ferdinand
> > Eingetragen im Handelsregister des Amtsgerichts Saarbrücken, HRB 11234
>
> --
> ----------------------------- Dr.-Ing. Christoph Cullmann ---------
> AbsInt Angewandte Informatik GmbH Email: cullmann@AbsInt.com
> Science Park 1 Tel: +49-681-38360-22
> 66123 Saarbrücken Fax: +49-681-38360-20
> GERMANY WWW: http://www.AbsInt.com
> --------------------------------------------------------------------
> Geschäftsführung: Dr.-Ing. Christian Ferdinand
> Eingetragen im Handelsregister des Amtsgerichts Saarbrücken, HRB 11234
>
[Attachment #3 (text/html)]
<div dir="ltr">Hi,<div><br></div><div>Unfortunately I've been hit my multiple \
pretty severe health scares in the last month, and have no idea when I'm going to \
be at 100% again.</div><div><br></div><div>For the time being I'd rather not hold \
up any development, so don't hold back anything on my \
account.</div><div><br></div><div>-- Boudhayan</div></div><div \
class="gmail_extra"><br><div class="gmail_quote">On 16 October 2016 at 17:46, \
Christoph Cullmann <span dir="ltr"><<a href="mailto:cullmann@absint.com" \
target="_blank">cullmann@absint.com</a>></span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">Hi,<br> <br>
(evil top posting)<br>
<br>
given the silence, I assume any interest in baloo has stopped once more, or?<br>
Or are there any plans how to fixup the current situation?<br>
<br>
Greetings<br>
<span class="HOEnZb"><font color="#888888">Christoph<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
----- Am 7. Okt 2016 um 20:08 schrieb cullmann <a \
href="mailto:cullmann@absint.com">cullmann@absint.com</a>:<br> <br>
> Hi,<br>
><br>
>> Hey<br>
>><br>
>> On Fri, Oct 7, 2016 at 6:34 PM, Christoph Cullmann <<a \
href="mailto:cullmann@absint.com">cullmann@absint.com</a>> wrote:<br> >>> \
Hi,<br> >>><br>
>>>> On Fri, Oct 7, 2016 at 5:58 PM, Christoph Cullmann <<a \
href="mailto:cullmann@absint.com">cullmann@absint.com</a>> wrote:<br> \
>>>>><br> >>><br>
>>> 1) No handling of DB errors beside asserting<br>
>>> 2) No handling of errors in the extractors (e.g. see the fixes I did, \
all<br> >>> extractors will need more of that)<br>
>>> 3) No handling of NFS/large inodes/inconsistencies => crash<br>
>>><br>
>>> In the end, in my opinion, you can rewrite close to all parts dealing \
with the<br> >>> DB or<br>
>>> any other thing internally. If ever any thing gots inconsistent, ATM you \
are<br> >>> doomed, forever,<br>
>>> if not by luck my new startup code deletes the index, then you live \
again until<br> >>> it is reindexed.<br>
>>><br>
>>>><br>
>>> I am not sure, I am all for removing complete indexing and use a other \
indexer<br> >>> like tracker to exactly avoid the excurse into DB world and \
how to handle it<br> >>> in a safe way with close to zero person \
manpower.<br> >>><br>
>><br>
>> It's avoiding the problem and hoping for the best, without any \
experiments.<br> > That is not true.<br>
><br>
> I did experiments and search works with tracker, but yes, a problem is \
tagging,+<br> > which ATM doesn't work. Nor do I say that is a ready solution \
now, just a<br> > possibility<br>
> to avoid having to maintain low level code with at most 1 person (how it \
looks<br> > ATM).<br>
><br>
> And I don't propose to go that road now, but ATM I see nobody doing any \
other<br> > experiments.<br>
><br>
> Besides, tracker is constantly maintained and used since >> 5 years:<br>
><br>
> <a href="https://github.com/GNOME/tracker/graphs/contributors" rel="noreferrer" \
target="_blank">https://github.com/GNOME/<wbr>tracker/graphs/contributors</a><br> \
><br> >><br>
>>><br>
>>> => That is good that we agree, but I find it very astonishing that we \
use baloo<br> >>> in its<br>
>>> current state more or less mandatory on all that systems were it by \
design will<br> >>> fail.<br>
>>><br>
>>> (and it fails if you read the bugs)<br>
>>><br>
>><br>
>> There is a certain amount of failure, but it's not \
"by-design". But<br> >> maybe I'm not seeing things clearly.<br>
> You yourself stated that neither 32-bit issues nor NFS nor > 32-bit inodes \
have<br> > any<br>
> error handling. And that seems to have been known even during design and \
still<br> > we have this now as a framework per default used by any Plasma \
installation on<br> > systems exactly featuring that without error checking.<br>
><br>
>><br>
>>>><br>
>>>>>><br>
>>>>>> How about requirements such as resource consumption, ease \
of<br> >>>>>> integration, search speed are taken into \
consideration? Come on guys.<br> >>>>>> We're engineers over \
here.<br> >><br>
>>>>> What is the argument here? If you take a look at <a \
href="http://bugs.kde.org" rel="noreferrer" target="_blank">bugs.kde.org</a>, you see \
that<br> >>>>> people are complaining about all<br>
>>>>> of that with baloo. I see no evidence nowhere that e.g. baloo is \
"superior" to<br> >>>>> what GNOME uses<br>
>>>>> or any other solution (perhaps beside nepomuk, ok...).<br>
>><br>
>> What tests have been to obtain the evidence?<br>
> What tests have been done to obtain the inverse evidence? I only hear here \
the<br> > complaint<br>
> about not taking requirements like resource consumption or speed into \
account,<br> > but<br>
> there is ATM zero evidence that e.g. tracker is slower.<br>
><br>
> And yes, there are "it hogs" 100% memory or time bugs open, thought \
you can<br> > hardly reproduce them<br>
> as people are somehow scared to pack their home and send it to us. Not that \
a<br> > lot of that bugs<br>
> got touched at all in Bugzilla.<br>
><br>
>><br>
>>><br>
>>>><br>
>>>> Yup, you have. It's awesome. I no longer have the motivation to \
work on Baloo.<br> >>> Thanks, but that makes me very sad, btw.<br>
>>> Baloo came up to replace nepomuk, which was dead because it had too many \
bugs<br> >>> and all maintainers left.<br>
>>> Now we have baloo, which has many bugs, some even by design, and the \
maintainer<br> >>> left, too.<br>
>>><br>
>><br>
>> Actually, Nepomuk was not dead. I was maintaining it. I killed it<br>
>> because it had too many structural problems.<br>
>><br>
>> This is how the open source world works. People work on projects and<br>
>> when it no longer scratches their itch (I no longer use Baloo), they<br>
>> loose interest. This is "supposed" to be a hobby.<br>
> That is ok, to see it as hobby.<br>
><br>
> But I am a bit unnerved that one proposes this as the generic index solution<br>
> for our desktop, which should be stable, if nothing else, and knows that it \
has<br> > severe<br>
> limitations that are not handled (see above). I would have assumed that at \
least<br> > the known "can't work here'<br>
> cases are handled in a graceful way.<br>
><br>
> And given already one of the first things main.cpp of baloo_file does is:<br>
><br>
> // HACK: Untill we start using lmdb with robust mutex support. We're \
just going<br> > to remove<br>
> // the lock manually in the baloo_file process.<br>
> QFile::remove(path + "/index-lock");<br>
><br>
> that doesn't leave high hopes, sorry.<br>
><br>
> And the typical error check is:<br>
><br>
> void MTimeDB::put(quint32 mtime, quint64 docId)<br>
> {<br>
> Q_ASSERT(mtime > 0);<br>
> Q_ASSERT(docId > 0);<br>
><br>
> MDB_val key;<br>
> key.mv_size = sizeof(quint32);<br>
> key.mv_data = static_cast<void*>(&mtime);<br>
><br>
> MDB_val val;<br>
> val.mv_size = sizeof(quint64);<br>
> val.mv_data = static_cast<void*>(&docId);<br>
><br>
> int rc = mdb_put(m_txn, m_dbi, &key, &val, 0);<br>
> Q_ASSERT_X(rc == 0, "MTimeDB::put", mdb_strerror(rc));<br>
> }<br>
><br>
> without any way to pass an error to the outside, nor any error handling code \
at<br> > the outside,<br>
> as no error can ever occur that is non-fatal.<br>
><br>
>><br>
>>><br>
>>>> (This is why they run on a separate process)<br>
>>> That doesn't help, it just OOMs your system => dead, it needs \
resource<br> >>> restrictions,<br>
>>> which is tricky to get right.<br>
>>><br>
>><br>
>> You're right. It needs a better thought out solution. A separate<br>
>> process is the bare minimum.<br>
>><br>
>> Btw, have you looked if Tracker actually does any of this?<br>
> It has process separation and it handles crashs well enough to not screw up<br>
> client process queries. And it has maintained extractors or miners, unlike \
us.<br> > But for sure, it has bugs and crashs and all things, but it is \
maintained and<br> > has a<br>
> constant stream of fixes for a longer time than baloo + all predecessors<br>
> together.<br>
><br>
>><br>
>>>> My hostility was because the proposal ignores key points such as \
-<br> >>>><br>
>>>> * Indexing Speed<br>
>>>> * Search speed<br>
>>>> * Database size<br>
>>> => If you look at the bugs, people complain we are inferior and I see \
not<br> >>> that the proposal ignores it, I just see not how to compare, \
given there are no<br> >>> hard facts that we are faster than e.g. tracker \
in any way.<br> >>><br>
>><br>
>> Data can be gathered about it. Not all data is publicly available.<br>
> That would make any decision easier to take.<br>
><br>
>><br>
>>>> * Ease of use with our existing components<br>
>>> My proposal did not change the interface at all, it has zero impact on \
"ease of<br> >>> use".<br>
>>><br>
>>>> * Ease of fixing problems in the code<br>
>>> My estimate would be: rewrite close to everything. Even the basic 64-bit \
int id<br> >>> won't work<br>
>>> with 64-bit inodes, each DB call must be touched to check for errors, at \
each<br> >>> place<br>
>>> one will need to check for potential inconsistencies and exit \
gracefully...<br> >>><br>
>><br>
>> I don't follow why everything needs to be re-written? Am I missing<br>
>> something or do we just need to check for more errors and use a higher<br>
>> integer id? This certainly doesn't seem super trivial, but it sounds<br>
>> like less work than implementing a shim on top of Tracker.<br>
> If you look at your own code, you will see, that there is no error handling \
at<br> > all,<br>
> beside asserts. (see above)<br>
><br>
> There is not even the concept of pass an error out to higher levels.<br>
><br>
> Perhaps I am wrong, because there is only a bit of documentation in \
addition,<br> > but if you start to add error handling at the DB calls, you can \
start to rewrite<br> > all internal layers.<br>
><br>
> Besides I don't see any documentation of the DB format, but I could miss \
that.<br> > (at least not in the git nor <a href="https://community.kde.org/Baloo" \
rel="noreferrer" target="_blank">https://community.kde.org/<wbr>Baloo</a>)<br> \
><br> >><br>
>> I could be wrong.<br>
> So coulbe be me ;=)<br>
><br>
>><br>
>>>><br>
>>>> Baloo has certain speed requirements if it is to be used with \
krunner,<br> >>>> and we want instant feedback. This was an integral \
requirement.<br> >>> I doubt e.g. tracker has different requirements, as it \
is used in similar places<br> >>> by GNOME.<br>
>>><br>
>>> But all that left besides, have you an proposal how to fixup the \
current<br> >>> situation?<br>
>>> Are you willing to invest some work to fix the current issues or an idea \
what<br> >>> would be a good way to tackle them?<br>
>>><br>
>><br>
>> I probably will not work more in Baloo.<br>
>><br>
>> I'll have to investigate the problems a bit more. From the cursory<br>
>> look of this thread, it doesn't seem that the problems are that \
dire.<br> >> But I may not be reading into it correctly.<br>
> What would be highly appreciated would be a bit of documentation what the<br>
> different pieces do and stuff like that, even if you have no time to code.<br>
><br>
> Greetings<br>
> Christoph<br>
><br>
> --<br>
> ----------------------------- Dr.-Ing. Christoph Cullmann ---------<br>
> AbsInt Angewandte Informatik GmbH Email: cullmann@AbsInt.com<br>
> Science Park 1 Tel: \
+49-681-38360-22<br> > 66123 Saarbrücken Fax: \
+49-681-38360-20<br> > GERMANY WWW: \
<a href="http://www.AbsInt.com" rel="noreferrer" \
target="_blank">http://www.AbsInt.com</a><br> > \
------------------------------<wbr>------------------------------<wbr>--------<br> \
> Geschäftsführung: Dr.-Ing. Christian Ferdinand<br> > Eingetragen im \
Handelsregister des Amtsgerichts Saarbrücken, HRB 11234<br> <br>
--<br>
----------------------------- Dr.-Ing. Christoph Cullmann ---------<br>
AbsInt Angewandte Informatik GmbH Email: cullmann@AbsInt.com<br>
Science Park 1 Tel: +49-681-38360-22<br>
66123 Saarbrücken Fax: +49-681-38360-20<br>
GERMANY WWW: <a \
href="http://www.AbsInt.com" rel="noreferrer" \
target="_blank">http://www.AbsInt.com</a><br>
------------------------------<wbr>------------------------------<wbr>--------<br>
Geschäftsführung: Dr.-Ing. Christian Ferdinand<br>
Eingetragen im Handelsregister des Amtsgerichts Saarbrücken, HRB 11234<br>
</div></div></blockquote></div><br></div>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic