[prev in list] [next in list] [prev in thread] [next in thread] 

List:       arts
Subject:    Re: Multithreaded audioiooss for arts
From:       Matthias Welwarsky <matze () stud ! fbi ! fh-darmstadt ! de>
Date:       2001-05-30 21:40:24
[Download RAW message or body]

Stefan Westerfeld wrote:

> But even with your patch, it can not. Assuming a new takes 500ms, your toss
> driver will underrun, because new data is still produced in the main thread
> (notifyTime), so there will be no new data for 500ms.

Yes, I know. As I said, the patch is more a proof of concept. I wanted
to
show that multithreading is definitely a chance to improve realtime
behaviour, but I didn't have the time to come up with something
complete. 
So I took a "minimum change approach".

> So the only way I see (which is not trivial, I agree, but probably the only
> way to go), is putting the production of new data (= flow system) and the
> output of the data (= audio thread) into a seperate thread. Inside this
> thread, allocations would not be allowed, nor anything else that could block
> for a long time (open, read, write, hostname lookup, ...).

Well, not necessarily. A certain amount of buffering is allowed, if you
do
it in the right place. We won't be able to maintain realtime over the 
complete flow from the producer to the consumer, because often we will
have no control about the producer's realtime behaviour. We need to 
identify the places where realtime is needed. Feeding the soundcard is 
clearly critical, an mp3-decoder is clearly not critical.

> The problem with that is that you need a transaction-oriented flow system.
> I.e. if you connect module A and B, and module C and B (looking like
>   A  -->  | -->  B
>   C  -->  |
> ), then you don't want that data gets calculated after connecting A and B,
> but before connecting C and B (for instance inside an instrument, that would
> lead to phase distortion).

Understood.

> I have recently designed an engine which can do this together with Tim Janik,
> and it is somewhat working. It also has the advantage that it can distribute
> calculations over several threads (making effective use of multi-processor
> systems).

This sounds promising.

> Anyway, a few comments to toss:
> 
> First of all, you can use Arts::Thread/Arts::Mutex/... for "portable"
> threading support.

Yes, I did a quick hack to have results fast. Another reason why I
didn't
use Arts::Thread is that it lacks an interface to set the scheduling
policy and the priority of the created thread. Also, I need semaphores
for my buffer queue code for the reader/writer locking. The bufferqueue
does not use "new" or "malloc" and should be realtime capable.

> Then, it would be interesting to have the soundcard buffering parameters
> (i.e. call artsshell status) with your subjective impressions about increased
> reliability through threading. My personal explaination would be something
> like:
> - if you have a broken soundcard, then blocking writes might be more
>   efficient than non-blocking writes, and select() might expose a few problems
>   of its own

It's not so much about "efficiency", it's definitely more about "sound
quality".
I can simply hear the difference between the threaded and the
non-threaded
version of the driver.

> - results are only comparable if the effective latency and fragmentation are
>   the same, i.e. if you have extra buffering in the thread of 16k, then you
>   need to give the unthreaded driver 16k more fragment buffer to be comparable

Well, I'm lacking a test platform here. As my soundcard driver is indeed
slightly broken, I cannot variate the fragment settings too much.
Certain
settings simply produce completely distortet output. What I can do, is
however
to reduce fragment size and count within my buffering code down to the
absolute minimum, so to minimize the additional buffering.

> - linux has difficulties measuring multithreaded programs as opposed to
>   single threaded programs

Huh? I don't understand this.

> So, could you do a benchmark where you exactly describe how much bytes
> where buffered in toss or oss (GETOSPACE), to see whether efficiency really
> increased?

I can try to do this.

> Generally, I think the switching between threads costs a few cycles, and so
> does the synchronization, so that a /sane/ configuration of artsd with -F
> and -S should do better than a multithreaded approach, if your sound card
> driver is implemented properly. But of course, believing something is not
> as good as knowing it ;).

If you do massive buffering, you can always buy reliability at the cost
of
efficiency. That's not what I want.

> Anyway for broken sound card drivers (and there are a lot of these these
> days), toss could perform remarkably better, and so I think including it
> into the CVS as optional driver probably makes sense.

ok.

> 
> About the IOManager:
> 
> You might want to have a look at defining the latency debugging in the
> IOManager code for getting figures how long it takes between two select()
> cycles. Priorities are an interesting concept, although I can not say how
> much the gain would be if you priorize every action of the IOManager, and
> rewrite the main cycle like:
> 
>         1. determine parameters for select (prepare)
>         2. select
>         3. determine which activities need to be done (check)
>         4. pick the first (and only the first) activity and do it (dispatch)

Given a duplex soundcard, which of the events would you serve first?
Capture
or playback? Or a timer notification event? You must prioritize, and you
must preempt, too.

> The important thing here is that to be really fine grained, we should rather
> only do one activity, as in the time that passes for doing this, another,
> higher priority event might have arrived, which we should do first. There
> is also a considerable overhead through additional syscalls that we might
> have through this. Adding a gettimeofday after doing each activity (to allow
> clustering of some activities) could help making a priorized IOManager
> efficient.

OK, but I must be able to interrupt a low level activity in favour of a
high
priority activity. That is, I must be able to stop decoding an mp3 in
order
to feed the soundcard. That cannot simply be done with select().
However, 
doing this with SCHED_FIFO threads is rather simple, because you just
give
the thread that feeds the soundcard a higher priority, and it will
interrupt 
the mp3 decoder once it needs more data. No additional syscalls needed.
The
feeder thread simply returns from the blocking write(audio_fd) call and
gets served instantly. Plus, you don't need extensive buffering for
that.
You just need to have enough spare cycles so that the producer can
generate
data while you serve the soundcard from your buffer. You don't even need
a "short" producer loop because you can interrupt it.

best regards,
	Matthias

-- 
Matthias Welwarsky
Fachschaft Informatik FH Darmstadt
Email: matze@stud.fbi.fh-darmstadt.de

"I bet the human brain is a kludge."
		-- Marvin Minsky

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic