[prev in list] [next in list] [prev in thread] [next in thread] 

List:       freedesktop-dbus
Subject:    Re: method timeouts vs systemd activation and slow systems
From:       "David Rheinsberg" <david () readahead ! eu>
Date:       2023-03-08 8:04:38
Message-ID: 45ee4204-b2c3-4c63-ab35-286d83c3dc86 () app ! fastmail ! com
[Download RAW message or body]

Hi

On Mon, Mar 6, 2023, at 1:39 PM, Simon McVittie wrote:
> There are some timeouts in the "limits" data structure of the message
> bus implementation (dbus-daemon or dbus-broker) which are somewhat
> orthogonal to the client-side method call timeout. In dbus-daemon
> configuration language:
> 
> * service_start_timeout (default 25s)
> * auth_timeout (default 5s)
> * pending_fd_timeout (default 150s)

For the record, we do not use those timeouts in dbus-broker. "Service-start" is under \
control of systemd, and for the other two we do not implement the timeouts since we \
have resource accounting in both situations (which requires rather recent linux APIs \
and does not necessarily have an equivalent on other platforms).


Regarding the proposal: I can only recommend dropping timeouts in dbus transactions. \
We have had excellent experience with this approach. Our reasoning is quite simple: \
Any high-level application will have some watchdog infrastructure that already \
ensures it is restarted if it is stuck. By adding timeouts to low-level operations \
you do not improve the resiliency. The only advantage I see is that you *might* \
notice a stuck-operation earlier, but really only if your low-level timeouts are \
lower than your high-level watchdogs. But the cost is high: you make the entire \
system more complex, you suddenly have to deal with conflicting timeouts in different \
communication APIs, you suddenly run into false-positives where your timeouts was \
simply too small and thus unsuitable for the target platform.

I do not believe transaction timeouts for reliable channels like dbus contribute to \
the system resiliency of recoverability. I also believe those timeouts do not make \
sense for low-level APIs. If you ensure your operations can be cancelled, I believe \
you should let the upper levels control the timeouts of the entire action rather than \
each individual step.


Lastly, note that whenever you "timeout" a method-transaction in a dbus-client, you \
leak the reply-window in the server. Unless the other side eventually replies or \
disconnects, you will accumulate those reply-windows and eventually reach your \
resource limit.

Thanks
David


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic