'Re: svg crystal?'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-devel
Subject:    Re: svg crystal?
From:       Josef Weidendorfer <Josef.Weidendorfer () gmx ! de>
Date:       2002-11-13 20:36:29
[Download RAW message or body]

On Wednesday 13 November 2002 18:01, Maks Orlovich wrote:
> > Hi,
> >
> > I don't think online rendering to be much of a issue. There always should
> > be something like a persistant icon cache (even with lazy icon
> > loading...). This should be a (few) big shared pixmap(s) on the X11
> > server where applications could blit the needed icons from.
>
> The problem is in the IPC needed to coordinate all this. I think Waldo did
> a prototype of an icon server, though.. But the thing is, with png icons
> and lazy loading icon loading time really doesn't seem much of an issue.
> Now if we could somehow lazy-load the Xft/fontconfig indeces, and
> lazy-create all those darn dozens of widgets each toolbar creates...
>
> > ksycoca mappable database. I would say another job for a KDED daemon.

What about a few big pixmaps in the XServer consisting of a grid of all icons 
for each icon size / active state?
We could calculate a fixed positions/icon name assoziation in advance 
(something like kbuildsycoca). And a shared mapping of boolean flags if a 
icon is already rendered. If not, the application will render it and update 
the pixmap. There's no IPC aside from looking up a bit in a mapped area 
needed. And it doesn't do harm if 2 apps render the same icon as it will 
overwrite the pixmap with the same picture.

> Certainly, making read-only mmap'd data avilable would make sense. The
> problem is, there are lots and lots of icons - which ones does one load,
> and how does one communicate about loading some others?

As I said above: Every app can render if it needs the icon. I suppose lazy 
icon loading here. Communication is setting a flag in the shared mapping.

> DCOP doesn't seem to be a good choice - I don't know about the typical
> latency, but for i.e. Konqui broadcasting to other copies of itself about a
> new history entry took something like 12ms*number of processes here.

Yes. Seems quite heavy. What about implementing DCOP with reading/writing 
pixmaps on the X server using the SHM extension if available? This could 
speed up the things alot.

> > Even for .png you have to apply icon effects. A nice thing of SVG is that
> > these icon effects could be simple added to the DOM before rendering.
>
> Yeah, but the icon effect code is there, and works, so it's not much of an
> advantage, I think..

Of course. But available icon effects are limited. There is no such limit with 
SVG. You can look at the specification ;-) You even could apply the desktop 
wide icon effects on SVG images of WWW pages browsed in Konqueror :-)

> > And I don't think that hardware accelaration in SVG drawing will bring
> > the Aside from that, if I start konqueror as file manager showing my
> > home, it WILL need >40 icons from the beginning.
>
> Yeah, in fact Konqueror listing my home dir - 1192 files - spends a
> considerable amount of time just checking for icons in the cache. Don't
> remember the number off top of my head, but it was in the order of seconds.
> And i.e. the changes to the fingerprint function I made cut out quite a bit
> of time for that.

I really don't know anything about the current icon cache. Should look at the 
source...

> > Speculating about the times special code parts take is almost useless.
> > You have to do benchmarks and profiles.
>
> I am basing my observations on about a month worth of manual
> intrumentations using cycle counters. Painful, but gives a nice overfiew of
> what's going on.

Wow!

>
> I would suggest using oprofile to
>
> > get the big picture for a konqueror startup. Inclusive kernel time, IPC
> > to dcopserver and kio slaves, X11 time needed and so on...
>
> I wouldn't.. Last time I checked, oprofile produces flat profiles, which in
> my experience are nearly useless in this case - i.e. unreadable. The top
> few entries are in single percentage points, and are generic datastructure
> stuff, plus malloc and co (oprofile was great for optimizing Keramik,
> however, since on a stress benchmark the code paths are much simpler). It
> just doesn't give much of an idea on what top-level parts of the code are
> spending the time - but rather then what leaves are. I mean, for instance
> that fingerprint example - the inefficiency was in doing too many string
> concatenations, too often.. But a QString::operator+ or such entry doesn't
> help much to figure out where the problem is - while going "OMG, we spend
> 30ms in KIconEffect::fingerPrint doing string ops" does.

Just wait!
That's not the problem of oprofile. If you do profiling with sampling (as 
gprof, oprofile etc.) you will only have this flat information.
To calculate cumulative values, you need the information of function calls 
made in your app. That's a whole different issue.
Look at the gcc manual page for "-fprofile-arcs",  "-ftest-coverage"...
This could be combined with the flat info from oprofile, and you can calculate 
all the needed cumulative values.
And for the time spent in kernel, you don't really need function call 
information as you can't change.

The advantage of oprofile is the usage of hardware counters. Want to know how 
many TLB misses the startup of your program generates? No problem.
And the advantage to cachegrind is: No simulation needed.

> KCacheGrind, on other hand, rocks, thanks a lot for it; really helps get an
> overview of the processing. It really is a giant step forward for trying to

Thanks. Most of this has to go to Valgrind coders (Julian and Nick).
Nick wrote the cachegrind part, giving the same flat profile as oprofile does.
I patched cachegrind to get function call information.
I was that fascinated by all these cool things, that I felt the need for a GUI 
allowing quick browsing of all the information...

> get things nice and fast. (Although obviously, there are precision
> limitations due to simulation). BTW, what's its strategy for handling

What do you mean with precision limitations? The use of 32bit values for event 
counters? I will have to add 64bit support then...

> syscalls timings? That seems like the hardest thing to measure right due to
> simulated->real CPU switch..

Cachegrind simply can't measure this. Valgrind never simulates syscalls; it 
simply calls them. And meassuring the real time costs of a syscall isn't of 
much usage: You can't relate this to the "syntetic" counters cachegrind 
produces. You know, we calculate a "Cycle estimation" out of instruction 
counts, cache misses, read/write accesses and so on. We never know the exact 
relation to real time.

I would say that oprofile is the solution here, together with call information 
from GCC.
Perhaps I will get some time and write a import filter for oprofile's flat 
info together with the function call info GCC dumps out. Combining this would 
give the real killer feature of KCachegrind (should rename to KProfiler 
then).

Josef

>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<
[prev in list] [next in list] [prev in thread] [next in thread]