From kde-devel Thu Dec 12 12:01:29 2013 From: Ignacio Serantes Date: Thu, 12 Dec 2013 12:01:29 +0000 To: kde-devel Subject: Re: Nepomuk in 4.13 and beyond Message-Id: X-MARC-Message: https://marc.info/?l=kde-devel&m=138684978717692 MIME-Version: 1 Content-Type: multipart/mixed; boundary="--===============8853585249583629218==" --===============8853585249583629218== Content-Type: multipart/alternative; boundary=089e0153836077be2104ed551dd8 --089e0153836077be2104ed551dd8 Content-Type: text/plain; charset=UTF-8 Welcome Baloo, New suggestions about development direction to avoid some problems related to Nepomuk: 1) Baloo must work as a service to share information with other users and minimize resources consumption. With Nepomuk a login is required and in multiuser environment this is a problem. 2) Data must be stored in one repository to improve information sharing with other users in the same or other computers. 3) Remote installation will be a good solution in cases you have several, with mixed OS or old, computers in your home or your office because some users prefer sharing data over speed. With cheap cloud computing have an own server running some services will be more common (owncloud, mpd, quassel, etc...) so considering this for the future would be great. 4) Baloo and Milou must compile and work without Akonadi. On Thu, Dec 12, 2013 at 11:46 AM, Vishesh Handa wrote: > Hey everyone > > > > During the KDE 4.11 cycle Nepomuk reached a maturity level that we were > happy > > with, it is reasonably fast, stable, and unless used together with Akonadi > it > > is no longer the "CPU consumer" it was before. We reached this state after > > years of analyzing what was wrong and what could be improved to the point > > where we no longer think any more improvement is possible only by > modifying > > our code. > > > > The next place where we could seek improvement was the RDF storage. We > have > > been using Virtuoso for about 4 years and it's been a game changer for us > > performing way faster than any other of the alternatives we ever used > before > > and more efficiently, but as many of you know (and others suffer) it is > not a RDF > > storage designed for the desktop and it will never be. Since nothing > better > > than Virtuoso exists for our use-case, we started to implement our own RDF > storage mechanism (codenamed Vishuoso). > > > > At some point during that progress we took a step back and re-analyzed the > > problems of the workspace and the current implementation. The problems are > - > > > > - Resource Description Framework (RDF) > > The biggest problem with RDF is that it raises the knowledge needed to > > contribute to a point where most developers decide to to skip it. After > all > > these years only a handful of brave developers have worked with it and the > > experience hasn't been good. > > > > Then we need something easier to use so we can see a more broad adoption. > > > > Additionally, RDF is a very flexible way to store data, it is however not > the > > most efficient way. Data is generally completely normalized even though it > is > > quite often not required. Eg - One doesn't need to store music file > artists as > > a separate contact. This is great, from a theoretical point of view, but > it is > > not very useful in practice. > > > > - RDF Storage > > There is no existing RDF storage designed to work in a Desktop. Virtuoso > is > > great but it quickly uses hundreds of megabytes of ram and it has its own > > share of problems. The other alternative is tracker, but they lack certain > > features required in Nepomuk. > > > > - Data duplication > > Nepomuk has been used as both a search store and a data store. This > results in > > massive data duplication and synchronization problems. In the case of > Akonadi, > > emails are stored in Akonadi and are then duplicated in virtuoso, and are > then > > duplicated in virtuoso's index. Every time data is changed in Akonadi it > has > > to be updated in Nepomuk and vice-versa. This results in a process being > > responsible just for synchronizing the two stores. > > > > - API Duplication > > With the data residing in both Nepomuk and other stores > (Akonadi/Files/etc), > > it isn't always clear which store it should be accessed from. This > essentially > > results in duplication of APIs. Eg - Using KABC vs accessing contacts from > > Nepomuk. > > > > These problems would still exist even if we had the fastest and most > efficient > > RDF storage in the world. > > > > At this point it was clear to us that the future was not going to be RDF. > The > > next thing we did was to analyze our actual needs based on the last 5 > years of > > Nepomuk. > > > > Our needs are - > > * Full text index for searching > > * Data store for simple objects such as tags / ratings / activities / etc > > * Relations - Forming relations between different objects. Eg - This > "file" is > > related to this "activity" or "person". > > Each of these problems is independently solvable without RDF. > > > > About 2 months ago we started to draft Baloo [1], a metadata solution that > > will cover the bare necessities of each use case we have. > > > > I'd like to avoid getting into the technical details of the implementation > in > > this thread. Another thread can be started about its different aspects > once > > you've read the basic ideas behind Baloo [2] > > > > Current Plans > > --------------------- > > > > After a month of designing the solution and a month of implementing it, > Baloo > > is working way better than Nepomuk does. So, I'd like to switch to Baloo > by > > default in 4.13, while keeping Nepomuk in maintenance mode for more > > conservative distributions. > > > > This is not a completely new project as large parts of Baloo code are > derived from Nepomuk and therefore comes with years of testing and real > world use. > > Baloo was also discussed in PIM Sprint and the PIM developers are happy > to > > completely drop Nepomuk support for 4.13 and move to Baloo. Similarly, the > > telepathy developers are also working on moving KPeople away from Nepomuk. > > > > Migration - There will be an automated migration mechanism for migrating > tags, > > ratings and comments from Nepomuk to Baloo. > > > > Trying it out? > > ------------------- > > > > Developers are welcome to try out Baloo and have a look at the current > source > > code. It's a still a work in progress, but we strongly feel that it is a > step > > in the right direction. > > > > I'd recommend using Milou [3] for searching. > > > > -- > > Vishesh Handa > > > > [1] https://projects.kde.org/projects/playground/base/baloo > > [2] http://techbase.kde.org/Projects/Baloo > > [3] https://projects.kde.org/projects/playground/base/milou > > > >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to > unsubscribe << > > -- Best wishes, Ignacio --089e0153836077be2104ed551dd8 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Welcome Baloo,=C2=A0

New= suggestions about development direction to avoid some problems related to = Nepomuk:

1) Baloo must work as a service to share information with o= ther users and minimize resources consumption. With Nepomuk a login is requ= ired and in multiuser environment this is a problem.
2) Data must be stored in one repository to improve information sharin= g with other users in the same or other computers.
3) Remote installatio= n will be a good solution in cases you have several, with mixed OS or old, = computers in your home or your office because some users prefer sharing dat= a over speed. With cheap cloud computing have an own server running some se= rvices will be more common (owncloud, mpd, quassel, etc...) so considering = this for the future would be great.
4) Baloo and Milou must compile and work without Akonadi.


On Thu, Dec 12, 2013 at 11:46 AM, Vishesh Handa <me@vhanda.in> w= rote:

Hey everyone

=C2=A0

During the KDE 4.11 cycle Nepomuk reached a maturity le= vel that we were happy

with, it is reasonably fast, stable, and unless used to= gether with Akonadi it

is no longer the "CPU consumer" it was before= . We reached this state after

years of analyzing what was wrong and what could be imp= roved to the point

where we no longer think any more improvement is possib= le only by modifying

our code.

=C2=A0

The next place where we could seek improvement was the = RDF storage. We have

been using Virtuoso for about 4 years and it's been= a game changer for us

performing way faster than any other of the alternative= s we ever used before

and more efficiently, but as many of you know (and othe= rs suffer) it is not a RDF

storage designed for the desktop and it will never be. = Since nothing better

than Virtuoso exists for our use-case, we started to im= plement our own RDF storage mechanism (codenamed Vishuoso).

=C2=A0

At some point during that progress we took a step back = and re-analyzed the

problems of the workspace and the current implementatio= n. The problems are -

=C2=A0

- Resource Description Framework (RDF)

The biggest problem with RDF is that it raises the know= ledge needed to

contribute to a point where most developers decide to t= o skip it. After all

these years only a handful of brave developers have wor= ked with it and the

experience hasn't been good.

=C2=A0

Then we need something easier to use so we can see a mo= re broad adoption.

=C2=A0

Additionally, RDF is a very flexible way to store data,= it is however not the

most efficient way. Data is generally completely normal= ized even though it is

quite often not required. Eg - One doesn't need to = store music file artists as

a separate contact. This is great, from a theoretical p= oint of view, but it is

not very useful in practice.

=C2=A0

- RDF Storage

There is no existing RDF storage designed to work in a = Desktop. Virtuoso is

great but it quickly uses hundreds of megabytes of ram = and it has its own

share of problems. The other alternative is tracker, bu= t they lack certain

features required in Nepomuk.

=C2=A0

- Data duplication

Nepomuk has been used as both a search store and a data= store. This results in

massive data duplication and synchronization problems. = In the case of Akonadi,

emails are stored in Akonadi and are then duplicated in= virtuoso, and are then

duplicated in virtuoso's index. Every time data is = changed in Akonadi it has

to be updated in Nepomuk and vice-versa. This results i= n a process being

responsible just for synchronizing the two stores.

=C2=A0

- API Duplication

With the data residing in both Nepomuk and other stores= (Akonadi/Files/etc),

it isn't always clear which store it should be acce= ssed from. This essentially

results in duplication of APIs. Eg - Using KABC vs acce= ssing contacts from

Nepomuk.

=C2=A0

These problems would still exist even if we had the fas= test and most efficient

RDF storage in the world.

=C2=A0

At this point it was clear to us that the future was no= t going to be RDF. The

next thing we did was to analyze our actual needs based= on the last 5 years of

Nepomuk.

=C2=A0

Our needs are -

* Full text index for searching

* Data store for simple objects such as tags / ratings = / activities / etc

* Relations - Forming relations between different objec= ts. Eg - This "file" is

related to this "activity" or "person&qu= ot;.

Each of these problems is independently solvable withou= t RDF.

=C2=A0

About 2 months ago we started to draft Baloo [1], a met= adata solution that

will cover the bare necessities of each use case we hav= e.

=C2=A0

I'd like to avoid getting into the technical detail= s of the implementation in

this thread. Another thread can be started about its di= fferent aspects once

you've read the basic ideas behind Baloo [2]

=C2=A0

Current Plans

---------------------

=C2=A0

After a month of designing the solution and a month of = implementing it, Baloo

is working way better than Nepomuk does. So, I'd li= ke to switch to Baloo by

default in 4.13, while keeping Nepomuk in maintenance m= ode for more

conservative distributions.

=C2=A0

This is not a completely new project as large parts of = Baloo code are derived from Nepomuk and therefore comes with years of testi= ng and real world use.

Baloo was also discussed in PIM Sprint and the PIM deve= lopers are happy to

completely drop Nepomuk support for 4.13 and move to Ba= loo. Similarly, the

telepathy developers are also working on moving KPeople= away from Nepomuk.

=C2=A0

Migration - There will be an automated migration mechan= ism for migrating tags,

ratings and comments from Nepomuk to Baloo.

=C2=A0

Trying it out?

-------------------

=C2=A0

Developers are welcome to try out Baloo and have a look= at the current source

code. It's a still a work in progress, but we stron= gly feel that it is a step

in the right direction.

=C2=A0

I'd recommend using Milou [3] for searching.

=C2=A0

--

Vishesh Handa

=C2=A0

[1] https://projects.kde.org/projects/playgr= ound/base/baloo

[2] http://techbase.kde.org/Projects/Baloo

[3] https://projects.kde.org/projects/playgr= ound/base/milou



>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub= to unsubscribe <<




--
Best wis= hes,
Ignacio

--089e0153836077be2104ed551dd8-- --===============8853585249583629218== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline >> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe << --===============8853585249583629218==--