'Re: Multithreaded audioiooss for arts'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       arts
Subject:    Re: Multithreaded audioiooss for arts
From:       Stefan Westerfeld <stefan () space ! twc ! de>
Date:       2001-05-30 10:22:56
[Download RAW message or body]

   Hi!

On Tue, May 29, 2001 at 01:09:30AM +0200, Matthias Welwarsky wrote:
> The main problem with aRts, as it stands, is that it's not really "Realtime". 
> I've had a look at the implementation of the iomanager, dispatcher and 
> audiosubsys classes and it a) uses a mainloop with select(), and b) uses 
> "new" for allocation of stream data and all other kind of buffering.

Well, you are right. aRts can not, in it's current implementation provide
low latency if you assume that new may take a long time (for instance because
the machine needs to swap out other pages for satisfying the new request).

But even with your patch, it can not. Assuming a new takes 500ms, your toss
driver will underrun, because new data is still produced in the main thread
(notifyTime), so there will be no new data for 500ms.

So the only way I see (which is not trivial, I agree, but probably the only
way to go), is putting the production of new data (= flow system) and the
output of the data (= audio thread) into a seperate thread. Inside this
thread, allocations would not be allowed, nor anything else that could block
for a long time (open, read, write, hostname lookup, ...).

The problem with that is that you need a transaction-oriented flow system.
I.e. if you connect module A and B, and module C and B (looking like
  A  -->  | -->  B
  C  -->  |
), then you don't want that data gets calculated after connecting A and B,
but before connecting C and B (for instance inside an instrument, that would
lead to phase distortion).

I have recently designed an engine which can do this together with Tim Janik,
and it is somewhat working. It also has the advantage that it can distribute
calculations over several threads (making effective use of multi-processor
systems).

My plan looks like migrating to that engine in the future. But there is a lot
of stuff to do:

1. the engine isn't complete yet, missing things are
  - cycles
  - timestamps
  - debugging ;)
2. the next step would be using the engine, but letting it run inside the
   main thread of aRts (which doesn't solve the problem yet, but allows
   testing of the engine)
3. then, probably excessive benchmarking and profiling needs to be done:
  - does the multithreaded calculation mode (which require additional locks)
    affect performance
  - does the communication between engine and aRts affect performance
  - is the scheduling (which module to calculate when) as effective as the
    old aRts scheduling used to be?
  - what about advanced aRts features like virtualization and multiports?
  - what about busses?
  - does the overall performance for midi synthesis look better or worse than
    before?
4. the next step is to /allow/ modules to be thread-safe - currently all
   modules are written under the assumption that they reside inside the main
   thread - which will be wrong when the engine spawns calculation over one
   or more than one thread which is not the main thread - so... basically
   there should be a new AdvancedSynthModule (: public StdSynthModule) which
   threadsafe implementations of SynthModules can derive from
5. modules can be ported one by one to the threadsafe form (whereas some
   modules probably can never be ported), so that if you use only these
   (which then can't do mcop, malloc, locking, ...), you can guarantee
   very low latencies (< 3ms) reliable ... if your underlying kernel can
   (i.e. linux-low-latency patched kernel)

So basically, that is the long-term plan I have for getting rid of the problem
completely without breaking existing design too much. But it definitely is
a huge piece of work... ;)

Anyway, a few comments to toss:

First of all, you can use Arts::Thread/Arts::Mutex/... for "portable"
threading support.

Then, it would be interesting to have the soundcard buffering parameters
(i.e. call artsshell status) with your subjective impressions about increased
reliability through threading. My personal explaination would be something
like:
- if you have a broken soundcard, then blocking writes might be more
  efficient than non-blocking writes, and select() might expose a few problems
  of its own
- results are only comparable if the effective latency and fragmentation are
  the same, i.e. if you have extra buffering in the thread of 16k, then you
  need to give the unthreaded driver 16k more fragment buffer to be comparable
- linux has difficulties measuring multithreaded programs as opposed to
  single threaded programs

So, could you do a benchmark where you exactly describe how much bytes
where buffered in toss or oss (GETOSPACE), to see whether efficiency really
increased?

Generally, I think the switching between threads costs a few cycles, and so
does the synchronization, so that a /sane/ configuration of artsd with -F
and -S should do better than a multithreaded approach, if your sound card
driver is implemented properly. But of course, believing something is not
as good as knowing it ;).

Anyway for broken sound card drivers (and there are a lot of these these
days), toss could perform remarkably better, and so I think including it
into the CVS as optional driver probably makes sense.

About the IOManager:

You might want to have a look at defining the latency debugging in the
IOManager code for getting figures how long it takes between two select()
cycles. Priorities are an interesting concept, although I can not say how
much the gain would be if you priorize every action of the IOManager, and
rewrite the main cycle like:

	1. determine parameters for select (prepare)
	2. select
	3. determine which activities need to be done (check)
	4. pick the first (and only the first) activity and do it (dispatch)

The important thing here is that to be really fine grained, we should rather
only do one activity, as in the time that passes for doing this, another,
higher priority event might have arrived, which we should do first. There
is also a considerable overhead through additional syscalls that we might
have through this. Adding a gettimeofday after doing each activity (to allow
clustering of some activities) could help making a priorized IOManager
efficient.

One main source of unnecessary latency in aRts might be the fact that
Synth_PLAY doesn't use IOType::reentrant, though. It doesn't do this, because
this might lead to doing calculations although the caller was just about to
shuffle connections (we need a transactional flow system - really ;), and
did an MCOP call, possibly without knowing he did one.

   Cu... Stefan
-- 
  -* Stefan Westerfeld, stefan@space.twc.de (PGP!), Hamburg/Germany
     KDE Developer, project infos at http://space.twc.de/~stefan/kde *-         

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic