[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-panel-devel
Subject:    Fwd: Scrap baloo?
From:       Christoph Cullmann <cullmann () absint ! com>
Date:       2016-09-14 22:13:56
Message-ID: 1619031179.21635.1473891236452.JavaMail.zimbra () absint ! com
[Download RAW message or body]

FYI, if you care, follow frameworks devel, guess double posting
only ends in pain.

Greetings
Christoph

----- Weitergeleitete Mail -----
Von: "cullmann" <cullmann@absint.com>
An: "kde-frameworks-devel" <kde-frameworks-devel@kde.org>
Gesendet: Mittwoch, 14. September 2016 23:29:22
Betreff: Scrap baloo?

Hi,

first, read that from my mail to the maintainer thread:

<snip>

Hi,

after looking a bit more at the code, I think there are ATM a lot of things that need \
fixing:

1) 32-bit system: I see no fix, > 1GB of index and baloo + all baloo using \
applications fail

  see bugs like https://bugs.kde.org/show_bug.cgi?id=356114 here we have the 5GB \
limit, which is now raised  for 64-bit, but not for 32-bit

2) Larger filesystems: unfortunately one decided to ignore the upper 32-bit of the \
inodes

/**
 * Convert the QT_STATBUF into a 64 bit unique identifier for the file.
 * This identifier is combination of the device id and inode number.
 */
inline quint64 statBufToId(const QT_STATBUF& stBuf)
{
    // We're loosing 32 bits of info, so this could potentially break
    // on file systems with really large inode and device ids
    return devIdAndInodeToId(static_cast<quint32>(stBuf.st_dev),
                             static_cast<quint32>(stBuf.st_ino));
}

=> random breakage e.g. on my NFS drive here as the IDs clash and all invariants no \
longer hold. (e.g. something can be a file but in addition a directory, ....)

3) No error handling of most lmdb faults (like already mentioned)

4) No error handling for any data corruption: e.g. many places will just endless loop \
or malloc, like  DocumentUrlDB::get(quint64 docId) (we have bugs for that)

5) lmdb locking issues: crash one read-write process => all other things stall (or \
crash because of 3+4)

6) No resource management nor crash handling for the baloo_file_extractor which \
either OOMs you or corrupts the database on crash leading to 5)

CC'd Vishesh, perhaps I am wrong with that issues and misunderstand the code, \
unfortunately e.g. the database structure is not that well documented, if I don't \
just not find the correct docs in the git.

</snip>

Now executive summary, after a day more looking at the code.

1) 32-bit systems: never will be usable, thanks to lmdb, at least not with \
non-trivial index sizes

2) network file system homes: never will be usable, thanks to lmdb (ask its author: \
http://lmdb.tech/doc/ "Do not use LMDB databases on remote filesystems, even between \
processes on the same host. This breaks flock() on some OSes, possibly memory map \
sync, and certainly sync between programs on different hosts."

3) close to no error handling in the code => see the crash reports, I cleaned up a \
bit, but they are piling  \
https://bugs.kde.org/reports.cgi?product=frameworks-baloo&output=show_chart&datasets=C \
ONFIRMED&datasets=ASSIGNED&datasets=REOPENED&datasets=UNCONFIRMED&datasets=RESOLVED&banner=1


4) fundamental problems like: wrong data structure for index (32-bit inodes in 21th \
century?) and close to zero docs what it does internally

Proposal:

Scrap baloo_file* and Co. and just reimplement the public API (modulo the settings \
for the then non-existing indexer daemon) to use tracker.

Benefits:

1) Tracker is maintained: https://github.com/GNOME/tracker/graphs/contributors
2) We share the index with GNOME/* and save double indexing on "many" Linux systems \
which are not plain KDE Plasma Desktop based 3) We can delete 99% of the code \
(question is if we can remove the very buggy extractors from KFileMetaData, too, \
afterwards somewhen).

=> Opinions?

Greetings
Christoph

-- 
----------------------------- Dr.-Ing. Christoph Cullmann ---------
AbsInt Angewandte Informatik GmbH      Email: cullmann@AbsInt.com
Science Park 1                         Tel:   +49-681-38360-22
66123 Saarbrücken                      Fax:   +49-681-38360-20
GERMANY                                WWW:   http://www.AbsInt.com
--------------------------------------------------------------------
Geschäftsführung: Dr.-Ing. Christian Ferdinand
Eingetragen im Handelsregister des Amtsgerichts Saarbrücken, HRB 11234
-- 
----------------------------- Dr.-Ing. Christoph Cullmann ---------
AbsInt Angewandte Informatik GmbH      Email: cullmann@AbsInt.com
Science Park 1                         Tel:   +49-681-38360-22
66123 Saarbrücken                      Fax:   +49-681-38360-20
GERMANY                                WWW:   http://www.AbsInt.com
--------------------------------------------------------------------
Geschäftsführung: Dr.-Ing. Christian Ferdinand
Eingetragen im Handelsregister des Amtsgerichts Saarbrücken, HRB 11234


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic