'Re: [Kstars-devel] Replacing file-system by database in KStars'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kstars-devel
Subject:    Re: [Kstars-devel] Replacing file-system by database in KStars
From:       Vijay Dhameliya <vijay.atwork13 () gmail ! com>
Date:       2014-01-29 5:52:37
Message-ID: CAH-KMQgZPaFnZWfYj1iFrRvJpK2=xEc7mGo2wR1tPzmoarCuFw () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]

Hi Henry,

Thank you for briefing me about the whole scenario of KStars.

What I understood from your explanation are two things:

1) We need to do brief study of structure of data we are handling in KStars
and see whether we can improve current file system or replace it with
better option like Database system.

2) Find out method to use facility of OS to render skyobjects in better way.

If I have misunderstood please correct me :)

And I will study the data structure of KStars and try to come up with some
diagram representing the entire data structure overview which can be put
under improvement cycle. Do you think that will be worthy or I should try
some other methodology ?

Regards,
Vijay

On Wed, Jan 29, 2014 at 9:44 AM, Henry de Valence
<hdevalence@hdevalence.ca>wrote:

> Hi Vijay,
>
> On January 28, 2014 08:22:08 AM Vijay Dhameliya wrote:
> > Hi guys,
> >
> > Currently when KStars is launched, it reads data corresponding to
> different
> > Skyobject from respective file in loaddata() methods. And I have tracked
> > out all the classes where we are loading data by reading file.
>
> Indeed, the code KStars uses to load data from the disk is messy and (IMO)
> not
> as efficient as it could be.
>
> > I researched bit on the topic and I found that loading data from database
> > is always much better option then doing same from file.
>
> The database's data is stored in a file on disk, so loading data from the
> database is loading from a file. It might be faster, if the use case for
> KStars' pattern of data-loading is served well by the database we use, and
> we
> can use the optimized code from the database instead of writing our own.
>
> The problem is that most databases are not actually suited to the kind of
> data
> we have or our usage patterns. The data we deal with is primarily spatial:
> we
> have points on the sphere, with extra metadata to tell us about the
> properties
> of the objects. Currently, KStars has a somewhat complicated system for
> spatially indexing the data with  a heirarchical triangle mesh, and loading
> the data from files as needed.
>
> In order to replace this with an SQL-based system, we'd need to use a
> database
> that has support for spatial queries. To the best of my knowledge, SQLite
> does
> not have such support. It would probably be possible to do something with
> PostgreSQL's PostGIS extension for dealing with geographic data, but KStars
> should not require the user to run and maintain a standalone database
> server,
> so SQLite is the only SQL option (and we do use it for some data).
>
> > If we replace file system with QSql following are the Pros:
> >
> > 1) We will not have to ship so many files with Kstars
>
> File count is less important than file size; if we're shipping the same
> data,
> it's unclear that we would see a big reduction in size. Also, it makes it
> harder to keep track of the data we have.
>
> > 2) Loading from database is quicker than doing same from file
>
> (See discussion above)
>
> > 3) Code for load methods will be reduced in size
>
> Yes, this would be really nice, but I think that there may be other
> avenues to
> do this.
>
> > Cons:
> > 1) I will have to move all data from files into database by temporary
> > methods
>
> I'm not quite sure what you mean. We already have to do this for the data
> we
> have: there's a collection of (as I recall quite hacky) scripts for the
> purpose of building the catalog files we use now. If we change our data
> representation, we have to change these, too.
>
> There's also:
>
> 2.   We lose spatial indexing, meaning that we may need to load an entire
> 2GB
> catalog for one small region of the sky.
>
> 3.   The only SQL database we can use is SQLite, which is designed to be
> small, not high-performance.
>
> > So I am planning to start coding to replace file system by database on my
> > local branch.
> >
> > Can you please give your views and suggestion regarding the same ? I am
> > sure that It will be very helpful to me. :)
>
> I agree that we should rethink the data-handling in KStars, but I think
> that
> it would be best to take a few steps back first, to see the bigger picture.
>
> The first task, in my opinion, is to clearly set out *what data we have*.
> For
> instance, it would be good to have scripts that will completely
> automatically
> fetch the raw datasets we use, and process them into our catalog format so
> that we have the entire process of creating the files written
> programmatically. Even though we don't need to regenerate the catalogs very
> often, the benefit of this is that it's documented in working, runnable,
> unit-
> tested code exactly what we do to the source data. Some datasets (afair)
> were
> assembled by us or by the Stellarium people, in which case those files
> should
> be treated as the 'raw data'.
>
> The question of how we should store our data is something I've been giving
> some thought to recently, but as I've been busy with school I haven't had
> time
> to implement a prototype yet. Since it's come up, though, I might as well
> share what I was thinking.
>
> It's possible to run all of our astrocalculations at much higher speed
> (using,
> e.g., code from my GSOC project), but actually doing this in practice is
> hard,
> since it requires reworking the data handling of the sky components.
>
> Currently, each component manages its own data handling, indexing, etc.,
> usually using the HTM library to compute spatial queries. Different
> components
> handle things differently -- for instance, the deep star component does
> lazy-
> loading of stars in blocks to avoid having to load huge catalogs all at
> once.
>
> One nice thing about most of our data is that it generally doesn't change,
> so
> our problem should be well-suited to an immutable data structure which
> gives
> us thread-safety and bug-avoidance for free. In addition, I think we should
> explore using facilities of the operating system to do the work for us. For
> instance, we could try use mmap (in the form of QFile::map() for
> portability)
> to map the contents of a binary catalog file directly into the virtual
> address
> space. The OS then loads data in pages as needed (and unloads the pages
> according to, AFAIR, least-recently-used *when needed* [^1]). If we arrange
> the data in our catalog file(s) to have spatial locality (i.e., data near
> each
> other in the file are nearby points in the sky), then we can have the
> kernel
> do the work of resource management / loading-unloading for us, greatly
> simplifying our code.
>
> Another issue we have is with proper motion. Technically, most of the
> points
> on the sphere that we have aren't points at all, but are actually "dual
> points" that have the data both of a point and the first-order differential
> near the point (i.e., the proper motion), which we have to take into
> account
> when we do queries in the far future. In effect, we have for each point a
> diffferential equation with initial conditions (the J2000 positions) and
> the
> equation of motion given by the proper motion, and we want to be able to do
> queries like:
>
> "What are all the points within angle alpha of this direction at time t?"
>
> The HTM library we use is not equipped to answer this question -- it only
> deals with points that don't move. So what we do now is go through and
> trash
> our index, reindexing all the points as we do our simulation. Except then
> there's all kinds of problems with stuff like, how fine should the reindex
> interval be, issues about stars in multiple trixels, .... it's a real mess.
>
> This got kind of long since it's sort of a brain dump, but hopefully it
> will
> stir some discussion.
>
> Cheers,
> Henry
>
> P.S. I'm really sorry I haven't been able to put as much time into KStars
> as
> I'd like recently.
>
> [^1]: I don't know about how Windows decides to unload mmap'd files; I
> assume
> it's not totally insane, but I guess I don't really care too much about
> how it
> performs as long as it runs. The more important portability issue, I
> think, is
> dealing with endianness issues, but I don't think that this is a huge
> problem.
> Worst case, stick a BOM in the beginning of the file and if the endianness
> is
> wrong, swizzle all the bytes and write the new catalog. Or tell packagers
> to
> ship compatible files, or something.
> _______________________________________________
> Kstars-devel mailing list
> Kstars-devel@kde.org
> https://mail.kde.org/mailman/listinfo/kstars-devel
>

[Attachment #5 (text/html)]

<div dir="ltr">Hi Henry,<div><br></div><div>Thank you for briefing me about the whole \
scenario of KStars.&nbsp;</div><div><br></div><div>What I understood from your \
explanation are two things:</div><div><br></div><div>1) We need to do brief study of \
structure of data we are handling in KStars and see whether we can improve current \
file system or replace it with better option like Database system.</div> \
<div><br></div><div>2) Find out method to use facility of OS to render skyobjects in \
better way.</div><div><br></div><div>If I have misunderstood please correct me \
:)</div><div><br></div><div>And I will study the data structure of KStars and try to \
come up with some diagram representing the entire data structure overview which can \
be put under improvement cycle. Do you think that will be worthy or I should try some \
other methodology ?</div> \
<div><br></div><div>Regards,</div><div>Vijay</div></div><div \
class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Jan 29, 2014 at 9:44 AM, \
Henry de Valence <span dir="ltr">&lt;<a href="mailto:hdevalence@hdevalence.ca" \
target="_blank">hdevalence@hdevalence.ca</a>&gt;</span> wrote:<br> <blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">Hi Vijay,<br> <div class="im"><br>
On January 28, 2014 08:22:08 AM Vijay Dhameliya wrote:<br>
&gt; Hi guys,<br>
&gt;<br>
&gt; Currently when KStars is launched, it reads data corresponding to different<br>
&gt; Skyobject from respective file in loaddata() methods. And I have tracked<br>
&gt; out all the classes where we are loading data by reading file.<br>
<br>
</div>Indeed, the code KStars uses to load data from the disk is messy and (IMO) \
not<br> as efficient as it could be.<br>
<div class="im"><br>
&gt; I researched bit on the topic and I found that loading data from database<br>
&gt; is always much better option then doing same from file.<br>
<br>
</div>The database&rsquo;s data is stored in a file on disk, so loading data from \
the<br> database is loading from a file. It might be faster, if the use case for<br>
KStars&rsquo; pattern of data-loading is served well by the database we use, and \
we<br> can use the optimized code from the database instead of writing our own.<br>
<br>
The problem is that most databases are not actually suited to the kind of data<br>
we have or our usage patterns. The data we deal with is primarily spatial: we<br>
have points on the sphere, with extra metadata to tell us about the properties<br>
of the objects. Currently, KStars has a somewhat complicated system for<br>
spatially indexing the data with &nbsp;a heirarchical triangle mesh, and loading<br>
the data from files as needed.<br>
<br>
In order to replace this with an SQL-based system, we&rsquo;d need to use a \
database<br> that has support for spatial queries. To the best of my knowledge, \
SQLite does<br> not have such support. It would probably be possible to do something \
with<br> PostgreSQL&rsquo;s PostGIS extension for dealing with geographic data, but \
KStars<br> should not require the user to run and maintain a standalone database \
server,<br> so SQLite is the only SQL option (and we do use it for some data).<br>
<div class="im"><br>
&gt; If we replace file system with QSql following are the Pros:<br>
&gt;<br>
&gt; 1) We will not have to ship so many files with Kstars<br>
<br>
</div>File count is less important than file size; if we&rsquo;re shipping the same \
data,<br> it&rsquo;s unclear that we would see a big reduction in size. Also, it \
makes it<br> harder to keep track of the data we have.<br>
<div class="im"><br>
&gt; 2) Loading from database is quicker than doing same from file<br>
<br>
</div>(See discussion above)<br>
<div class="im"><br>
&gt; 3) Code for load methods will be reduced in size<br>
<br>
</div>Yes, this would be really nice, but I think that there may be other avenues \
to<br> do this.<br>
<div class="im"><br>
&gt; Cons:<br>
&gt; 1) I will have to move all data from files into database by temporary<br>
&gt; methods<br>
<br>
</div>I&rsquo;m not quite sure what you mean. We already have to do this for the data \
                we<br>
have: there&rsquo;s a collection of (as I recall quite hacky) scripts for the<br>
purpose of building the catalog files we use now. If we change our data<br>
representation, we have to change these, too.<br>
<br>
There&rsquo;s also:<br>
<br>
2. &nbsp; We lose spatial indexing, meaning that we may need to load an entire \
2GB<br> catalog for one small region of the sky.<br>
<br>
3. &nbsp; The only SQL database we can use is SQLite, which is designed to be<br>
small, not high-performance.<br>
<div class="im"><br>
&gt; So I am planning to start coding to replace file system by database on my<br>
&gt; local branch.<br>
&gt;<br>
&gt; Can you please give your views and suggestion regarding the same ? I am<br>
&gt; sure that It will be very helpful to me. :)<br>
<br>
</div>I agree that we should rethink the data-handling in KStars, but I think \
that<br> it would be best to take a few steps back first, to see the bigger \
picture.<br> <br>
The first task, in my opinion, is to clearly set out *what data we have*. For<br>
instance, it would be good to have scripts that will completely automatically<br>
fetch the raw datasets we use, and process them into our catalog format so<br>
that we have the entire process of creating the files written<br>
programmatically. Even though we don&rsquo;t need to regenerate the catalogs very<br>
often, the benefit of this is that it&rsquo;s documented in working, runnable, \
unit-<br> tested code exactly what we do to the source data. Some datasets (afair) \
were<br> assembled by us or by the Stellarium people, in which case those files \
should<br> be treated as the &lsquo;raw data&rsquo;.<br>
<br>
The question of how we should store our data is something I&rsquo;ve been giving<br>
some thought to recently, but as I&rsquo;ve been busy with school I haven&rsquo;t had \
time<br> to implement a prototype yet. Since it&rsquo;s come up, though, I might as \
well<br> share what I was thinking.<br>
<br>
It&rsquo;s possible to run all of our astrocalculations at much higher speed \
(using,<br> e.g., code from my GSOC project), but actually doing this in practice is \
hard,<br> since it requires reworking the data handling of the sky components.<br>
<br>
Currently, each component manages its own data handling, indexing, etc.,<br>
usually using the HTM library to compute spatial queries. Different components<br>
handle things differently -- for instance, the deep star component does lazy-<br>
loading of stars in blocks to avoid having to load huge catalogs all at once.<br>
<br>
One nice thing about most of our data is that it generally doesn&rsquo;t change, \
so<br> our problem should be well-suited to an immutable data structure which \
gives<br> us thread-safety and bug-avoidance for free. In addition, I think we \
should<br> explore using facilities of the operating system to do the work for us. \
For<br> instance, we could try use mmap (in the form of QFile::map() for \
portability)<br> to map the contents of a binary catalog file directly into the \
virtual address<br> space. The OS then loads data in pages as needed (and unloads the \
pages<br> according to, AFAIR, least-recently-used *when needed* [^1]). If we \
arrange<br> the data in our catalog file(s) to have spatial locality (i.e., data near \
each<br> other in the file are nearby points in the sky), then we can have the \
kernel<br> do the work of resource management / loading-unloading for us, greatly<br>
simplifying our code.<br>
<br>
Another issue we have is with proper motion. Technically, most of the points<br>
on the sphere that we have aren&rsquo;t points at all, but are actually \
&ldquo;dual<br> points&rdquo; that have the data both of a point and the first-order \
differential<br> near the point (i.e., the proper motion), which we have to take into \
account<br> when we do queries in the far future. In effect, we have for each point \
a<br> diffferential equation with initial conditions (the J2000 positions) and \
the<br> equation of motion given by the proper motion, and we want to be able to \
do<br> queries like:<br>
<br>
&ldquo;What are all the points within angle alpha of this direction at time \
t?&rdquo;<br> <br>
The HTM library we use is not equipped to answer this question -- it only<br>
deals with points that don&rsquo;t move. So what we do now is go through and \
trash<br> our index, reindexing all the points as we do our simulation. Except \
then<br> there&rsquo;s all kinds of problems with stuff like, how fine should the \
reindex<br> interval be, issues about stars in multiple trixels, .... it&rsquo;s a \
real mess.<br> <br>
This got kind of long since it&rsquo;s sort of a brain dump, but hopefully it \
will<br> stir some discussion.<br>
<br>
Cheers,<br>
Henry<br>
<br>
P.S. I&rsquo;m really sorry I haven&rsquo;t been able to put as much time into KStars \
as<br> I&rsquo;d like recently.<br>
<br>
[^1]: I don&rsquo;t know about how Windows decides to unload mmap&rsquo;d files; I \
assume<br> it&rsquo;s not totally insane, but I guess I don&rsquo;t really care too \
much about how it<br> performs as long as it runs. The more important portability \
issue, I think, is<br> dealing with endianness issues, but I don&rsquo;t think that \
this is a huge problem.<br> Worst case, stick a BOM in the beginning of the file and \
if the endianness is<br> wrong, swizzle all the bytes and write the new catalog. Or \
tell packagers to<br> ship compatible files, or something.<br>
<div class="HOEnZb"><div \
class="h5">_______________________________________________<br> Kstars-devel mailing \
list<br> <a href="mailto:Kstars-devel@kde.org">Kstars-devel@kde.org</a><br>
<a href="https://mail.kde.org/mailman/listinfo/kstars-devel" \
target="_blank">https://mail.kde.org/mailman/listinfo/kstars-devel</a><br> \
</div></div></blockquote></div><br></div>

_______________________________________________
Kstars-devel mailing list
Kstars-devel@kde.org
https://mail.kde.org/mailman/listinfo/kstars-devel

[prev in list] [next in list] [prev in thread] [next in thread]