[prev in list] [next in list] [prev in thread] [next in thread] 

List:       freedesktop-xorg-devel
Subject:    Re: [RFC][PATCH] Make GetXIDRange O(1) instead of O(N^2)
From:       Roberto Ragusa <mail () robertoragusa ! it>
Date:       2013-05-29 21:10:57
Message-ID: 51A66EE1.7090301 () robertoragusa ! it
[Download RAW message or body]

On 05/28/2013 08:58 PM, Keith Packard wrote:
> Roberto Ragusa <mail@robertoragusa.it> writes:
> 
> > This patch, in addition to raising the max hash size from 2^11 to 2^16 (2^11 was \
> > decided in 1994, or maybe even 1987), adds explicit tracking of free ids. The \
> > data structure I've chosen is an rb-tree of free extents.
> 
> I'm not excited at the notion of adding another data structure solely to speed this \
> fairly uncommon operation. 
> Given that we can extract the necessary information from the existing data \
> structure, here are a couple of brief ideas: 
> 1) Cache just a few free XID ranges; when XIDs are freed, you can extend or merge \
> them together. This would make the operations cheaper at least, and might cover the \
> bulk of the performance problems you found.

This would only delay the issue at the moment there is nothing cached
anymore; then you have to solve the same exact "find free id" problem.
I hate when performance is good and then slows down mysteriously; this
entire investigation started when I noticed my session was snappy
for a few days and then degenerated to slowish and then unusable.
I DO have X sessions open for months on my laptop.

> 2) Delay constructing the new data structure until the application actually runs \
> out of IDs. Once that happens, you could turn on the free ID tracking. I can \
> imagine building this incrementally by finding two free ranges each time the client \
> asks for one and adding them both to the new data structure until you fail to find \
> any more.

But finding still remains a problem. What are you really improving
in respect to the actual situation, which is "find one range and return it"?
The fundamental problem is that the hash structure is completely
anti-range; it distributes consecutive XIDs everywhere (in fact, I briefly
considered about not using the last bits of the IDs in the hash calculation).

> 3) Create an entirely new data structure for the whole resource DB that would make \
> finding free extents less expensive. I'd suggest a skip-list as that makes in-order \
> traversal really cheap (to find a free extent) while also reducing the average \
> per-node cost down close to one pointer.

This is something I tried to explore too.
Fast lookup is essential, I think, and the current hash is almost unbeatable.
The problem is very similar to free space tracking in filesystems.
The options I considered were: a bitmap (fast insert/delete, bad search,
huge mem usage!) or a tree. I avoided the extra complexity of a btree, and
settled for an rbtree where nodes are extents (as ranges are exactly what
we are interested in). A btree of extents could be more efficient memory-wise
as the payload of the nodes is really small (an extent is just 2 ints).

> In any case, what we really need is credible performance data to use to compare the \
> old and new implementation. I think we need data that shows: 
> 1) 'real' application performance. The cairo performance stuff using Render is \
> probably a good way to measure this.

I do not have an application to stress test this.
Please give me any pointer to interesting tools, I want to stress the
correctness of the implementation too.

> 2) Memory usage. How much additional memory does this use, and, ideally, how much \
> additional memory is getting traversed when manipulating the resource data base. \
> Measuring the X server memory usage in a couple of situations with and without the \
> changes should suffice.

I have my normal desktop running here on the modified code and I can show you
how big the structure is for some significant cases.
(two days uptime, 94000 seconds of operation, because I suspend during the night)

----------

This is a "normal" client, gkrellm running for two days.
(xrestop line, _complete_ rbtree, rbtree statistics)

4000000     4   35    0  448   50     1576K      2K   1578K  3036 gkrellm

[ 93708.434] (II)                     <44000005:7fffffff>
[ 93708.434] (II)                 041a6da4:43ffffff
[ 93708.434] (II)                     <04196abf:041a6da2>
[ 93708.434] (II)             04196aba:04196abd
[ 93708.434] (II)                     <04196aa8:04196aa9>
[ 93708.434] (II)                 040127db:04196aa1
[ 93708.434] (II)         <040002c3:040127d9>
[ 93708.434] (II)                 040002b3:040002b4
[ 93708.434] (II)             040002a7:040002ac
[ 93708.434] (II)                     <04000297:04000298>
[ 93708.434] (II)                 0400024d:0400028e
[ 93708.434] (II)     0400024a:0400024b
[ 93708.434] (II)                 04000242:04000243
[ 93708.434] (II)             0400021d:0400021e
[ 93708.434] (II)                 04000207:0400020a
[ 93708.434] (II)         <040001f3:040001f4>
[ 93708.434] (II)                 040001dc:040001df
[ 93708.434] (II)             040001c4:040001c7
[ 93708.434] (II)                 040001bb:040001bb
[ 93708.434] (II) 040001b0:040001b9
[ 93708.434] (II)                 040001ac:040001ac
[ 93708.434] (II)             04000194:0400019b
[ 93708.434] (II)                 04000187:0400018b
[ 93708.434] (II)         04000184:04000185
[ 93708.434] (II)                 04000180:04000180
[ 93708.434] (II)                     <0400017a:0400017b>
[ 93708.434] (II)             04000168:0400016f
[ 93708.434] (II)                 04000148:0400014f
[ 93708.434] (II)     <04000128:0400012f>
[ 93708.434] (II)                 04000108:0400010f
[ 93708.434] (II)             040000e8:040000ef
[ 93708.434] (II)                 040000c8:040000cf
[ 93708.434] (II)         040000a8:040000af
[ 93708.434] (II)                 04000088:0400008f
[ 93708.434] (II)             04000069:0400006f
[ 93708.434] (II)                     04000064:04000064
[ 93708.434] (II)                         <04000051:04000052>
[ 93708.434] (II)                 <04000028:04000028>
[ 93708.434] (II)                     00000000:03ffffff
[ 93708.434] (II)   node count:39
[ 93708.434] (II)      inserts:1730244
[ 93708.434] (II)      deletes:1730205
[ 93708.434] (II)      lookups:5194957
[ 93708.434] (II) lookup_evals:43623819
[ 93708.434] (II)         mark_used:1732009
[ 93708.434] (II)    mark_used_omit:2
[ 93708.434] (II)       mark_unused:1731473
[ 93708.434] (II) maybe_mark_unused:1731473

As you can see, 1.7M inserts/deletes. Each deletes makes 3 lookups (to
detect merging conditions), so 5M lookups. 43M nodes explored, as the height
of the tree is <10, because we just have 39 extents.
The memory used is so 39*sizeof(rbtree node), which is 39*36 ~= 1KiB
on 64bit arch.

----------

This is a HEAVY client, firefox running for two days with 625 fully loaded
tabs (yes, 625).

(xrestop line, _partial_ rbtree, rbtree statistics)

6a00000   102   54    1 1210 1887    85194K     48K  85243K  8779 Schrödinger's \
�~_~X� and outside-the-box naming [LWN.net] - Mozilla Firefox

...
[ 94023.210] (II)                                         06a13524:06a13556
[ 94023.210] (II)                                     06a13517:06a13522
[ 94023.210] (II)                                             <06a134f5:06a13514>
[ 94023.210] (II)                                         06a134d1:06a134f3
[ 94023.210] (II)             06a1349d:06a134ce
[ 94023.210] (II)                                         <06a13441:06a1349b>
[ 94023.210] (II)                                     06a1341d:06a1343f
[ 94023.210] (II)                                         <06a13417:06a13418>
[ 94023.210] (II)                                 06a133f6:06a13415
[ 94023.210] (II)                                         06a133e3:06a133f4
[ 94023.210] (II)                                     <06a1332f:06a133e0>
[ 94023.210] (II)                                         06a1331c:06a1332d
[ 94023.210] (II)                             06a132cd:06a13319
...
[ 94023.211] (II)   node count:1429
[ 94023.211] (II)      inserts:13679868
[ 94023.211] (II)      deletes:13678439
[ 94023.211] (II)      lookups:89129915
[ 94023.211] (II) lookup_evals:1182562960
[ 94023.211] (II)         mark_used:29572199
[ 94023.211] (II)    mark_used_omit:60
[ 94023.211] (II)       mark_unused:29568982
[ 94023.211] (II) maybe_mark_unused:29569005

With 13M inserts/deletes here, we have 1429 nodes, so 50KiB of memory for the \
structure. The average height of the tree appears to be 13.

----------

Finally, here is the guy who triggered all my study; the plasma-desktop systray
which is spammed by psi with flashing icons (either psi or plasma-desktop is
leaking pixmaps).

(xrestop line, _partial_ rbtree, rbtree statistics)
            pixmaps------------+
                               v
1600000    65   62    1   90 211194    10738K   4953K  15692K  3266 plasma-desktop

[ 94684.741] (II)                                                                     \
4160001d:4160001f [ 94684.741] (II)                                                   \
4160001b:4160001b [ 94684.741] (II)                                                   \
41600016:41600016 [ 94684.741] (II)                                                   \
41600006:41600006 [ 94684.741] (II)                                                   \
017ffff9:415fffff [ 94684.741] (II)                                                   \
<017fffe9:017ffff7> [ 94684.741] (II)                                                 \
017fffd9:017fffe7 [ 94684.741] (II)                                                   \
017fffc9:017fffd7 [ 94684.741] (II)                                                   \
017fffb9:017fffc7 [ 94684.741] (II)                                                   \
                <017fffa9:017fffb7>
...
[ 94685.003] (II)                                                             \
01740695:017406a3 [ 94685.003] (II)                                                   \
01740685:01740693 [ 94685.003] (II)                                             \
01740675:01740683 [ 94685.003] (II)                                                   \
01740665:01740673 [ 94685.003] (II)                                                   \
01740655:01740663 [ 94685.003] (II)                                                   \
01740645:01740653 [ 94685.003] (II)                                                   \
01740635:01740643 [ 94685.003] (II)                                                   \
01740625:01740633 [ 94685.003] (II)                                                   \
01740615:01740623 [ 94685.003] (II)                                                   \
01740605:01740613 [ 94685.003] (II)                                                   \
017405f5:01740603 [ 94685.003] (II)                                                   \
017405e5:017405f3 [ 94685.003] (II)                                                   \
017405d5:017405e3 [ 94685.003] (II)                                                   \
017405c5:017405d3 [ 94685.003] (II)                                                   \
                017405b5:017405c3
...
[ 94685.727] (II)                                                                 \
0160005c:01600060 [ 94685.727] (II)                                                   \
01600054:01600058 [ 94685.727] (II)                                                   \
0160004c:01600050 [ 94685.727] (II)                                                   \
01600044:01600048 [ 94685.727] (II)                                                   \
0160003c:01600040 [ 94685.727] (II)                                                   \
01600034:01600038 [ 94685.727] (II)                                                   \
01600026:0160002d [ 94685.727] (II)                                                   \
01600021:01600021 [ 94685.727] (II)                                                   \
0160001f:0160001f [ 94685.727] (II)                                                   \
01600011:01600019 [ 94685.727] (II)                                                   \
0160000f:0160000f [ 94685.727] (II)                                                   \
01600005:01600007 [ 94685.727] (II)                                                   \
00000000:015fffff


[ 94685.727] (II)   node count:185743
[ 94685.727] (II)      inserts:1480748
[ 94685.727] (II)      deletes:1295005
[ 94685.727] (II)      lookups:9906008
[ 94685.727] (II) lookup_evals:267201246
[ 94685.727] (II)         mark_used:3352784
[ 94685.727] (II)    mark_used_omit:456
[ 94685.727] (II)       mark_unused:3143170
[ 94685.727] (II) maybe_mark_unused:3143620

We have a pathological case, with 185,000 nodes (that is 6.5MiB).
Tree height is about 26.
The available XIDs are 0x0160000-0x017fffff.
As you can see from the central part of the tree structure, there is a leak because
every 16 allocated XIDs, only 15 are freed.
The client has incremented the XID and left a lot of holes behind in the whole range, \
then the 0x017fffff limit has been reached and the allocation of free XIDs is now \
happening. The log says, for example:

[ 94683.640] (II) find min=01600000 max=017fffff
[ 94683.640] (II) looking up=01600000:017fffff (over)
[ 94683.640] (II) find returned 016bc194:016bc19a

> 3) Performance of applications running out of XIDs to know whether the change \
> actually fixes the problem. Have you already constructed some way of measuring \
> this?

This is easy.
Let psi flash for a few hours. When the numbers of pixmaps reaches 200,000
the X server is as sluggish as unusable. The patched server works well even
with 800,000 pixmaps (I didn't happen to have more).


> However, while the protocol requires that IDs be unique across all resource types, \
> the server implementation does not. This allows the X server to associate other \
> data with specific IDs and have that freed automatically. Looking up IDs always \
> involves the type or class of the ID, so code can be assured of getting the right \
> data structure back.

Then I suppose it happens at the server level.
I just see the calls to AddResource and FreeResource, so I was probably wrong
in implying this is caused by clients.

Have a look at mark_used_omit, mark_unused at maybe_mark_unused in the
statistics I pasted above.

Thanks for your comments.
-- 
   Roberto Ragusa    mail at robertoragusa.it

_______________________________________________
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic