'[OpenJDK 2D-Dev] Thread-Private RenderBuffers for RenderQueue?'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-2d-dev
Subject:    [OpenJDK 2D-Dev] Thread-Private RenderBuffers for RenderQueue?
From:       Dmitri.Trembovetski () Sun ! COM (Dmitri Trembovetski)
Date:       2008-03-25 18:35:08
Message-ID: 47E945DC.2030404 () Sun ! COM
[Download RAW message or body]


   Hi Clemens,

Clemens Eisserer wrote:
> 3.)  http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6253009
> Mentions a deadlock problem that can occur with a seperate lock for
> the RenderQueue.
> For my X11 pipeline it would be enough to ensure only one thread does
> access xlib, it does not have to be always the queue-flush thread. So
> if I would allow sync()/flushNow() on any thread, the problem would
> not exist, right?
> 
> 4.)
>> If the thread calling sync() sees theInstance as null, this means
>>  that it could not have anything to sync
> As far as I understand the JMM, it could be that thread1 already
> called getInstance() (which creates and sets theInstance()), but
> thread2 calls sync() - but sees null. Don't know wether a lost sync()
> could be a problem at all.

   It might be a problem in an unlikely scenario where an application
   uses a thread to only call sync() from it and never calling
   any getInstance().

> 
> 5.)
>> Anyway, I would suggest that you look at optimizing
>> this later
> Yes, that would be probably the best.
> 
> I was just a bit worried which design I should choose.
> The JNI-overhead itself (35 cycles, Core2Duo) is so small, that I am
> not sure wether the whole Buffered Rendering is a win at all.
> I benchmarked the switch-statement which is used to decode the
> command-stream and on my Core2Duo. Only calling the switch in a loop
> already takes 20 cycles (which is quite reasonable keeping in mind the
> generated table-jump puzzles the pipeline). Add the overhead of
> stream-encoding, inter-thread communication and I guess it's also
> somewhere between 30-50 cycles per j2d-primitive.
> 
> However if I could remove most of the locking, which at least on my
> machine seems to add a lot of overhead, this would justify the
> additional code.
> With thread-private buffers, and all threads allowed to flush the
> queue themself instead of relying on the queue-flush-thread to do it,
> it should be possible.

   You're welcome to implement the pipeline however you wish.

   One of the main reasons behind STR was to improve the stability
   of the OpenGL pipeline since OpenGL doesn't like to be accessed
   from multiple threads - the JNI overhead reduction was a
   welcome benefit. Same happen to be the case for the new
   Direct3D pipeline.

   If your pipeline doesn't have this restriction and you
   apparently are satisfied with JNI performance, don't
   use STR for your pipeline. After all, the current X11 pipeline
   doesn't use it, and you can still use a few tricks from
   it to reduce the JNI overhead (like doing all validation on
   the java level).

   Thanks,
     Dmitri


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic