Hi, Rob Taylor wrote: > The interesting thing here is even when doing no marshalling and with > validation disabled, its still at about 2x against ORBit. What other > design decisions do you think might be influencing this? I would say first, raw sockets are super fast. There just is not a lot of work the kernel is doing; when we write(), we switch into the kernel and I believe basically memcpy() into the read() buffer on the other side. I'm not a kernel developer, but the point is that the raw unix domain sockets case is doing very little CPU work. Last time I collected actual data, there was not a clear single hotspot adding the overhead vs. raw sockets; it was just all the stuff we were doing to create and parse messages, queue them up, etc., each little bit of work adding to the total. If you trace through what happens to send and receive dbus messages, there is quite a bit of code. We have the whole message queue abstraction with DBusConnection, the DBusPendingCall machinery, thread locking, parsing and marshaling messages, and so forth. None of this code is what I would call micro-optimized; there are lots of function calls, lots of abstractions, plenty of malloc(). I think it's not hard to believe that all this machinery around the message queue and the DBusMessage object is at least as much CPU work as the highly-optimized unix domain socket read()/write(). If we believe that, we'd expect at least 2x slower than raw sockets. Boiling it down to design decisions, some of them are libdbus-specific and some are protocol-specific. I don't know how ORBit works in enough detail to compare, so I'll just talk about dbus in an absolute sense. To be clear, this is speculation... profiling is needed. Example protocol features that I think are potentially slow: 1) replies are not ordered, as they are in X11 protocol; you can get replies in a different order than the calls. Also, signals can be interleaved with replies. The result is that a message queue is required to implement the protocol. 2) strings are used instead of integer IDs for various things, such as object paths and interfaces 3) each message has to be walked and validated/unpacked - you could imagine a design without the variable-length, tagged header fields, for example, so that header fields could be accessed at a fixed offset instead of having to build up an index of them 4) the bus daemon, of course, doubles any round-trip time Example libdbus decisions that I think are potentially slow: 1) thread safety (both locks, and extra refcounting, etc.) 2) validation (including security paranoia, e.g. use of DBusString) 3) handling out-of-memory often results in elaborate machinery to create "transactions," especially in the bus daemon 4) the main loop glue gunk adds overhead vs. either a single hardcoded loop or always blocking 5) DBusMessage adds overhead; a blocking API could avoid creating that intermediate object, and marshal app data structures directly to/from sockets. this would break some of the flexibility of libdbus, of course, such as ability for multiple handlers to get a look at a message. 6) the "object tree" thing inside DBusConnection - processing a message involves parsing an object path and traversing the tree to find handlers 7) the iterator approach to the DBusMessage public API, and the internal marshaling APIs, could be slower than an approach where the app had to provide all the data at once, perhaps even all the data at once in a single struct with known format 8) abstraction layers; the support for multiple transports, multiple ways of doing things (blocking or not, etc.), different thread libraries, DBusString, all these layers add a bit more code 9) the security and resource limits add some overhead to keep track of how many bytes of messages we have, etc. Most of these things are small... my speculation is that roughly speaking, since the kernel can read/write from a unix domain socket so quickly, a bunch of small things pretty easily add up to a significant overhead relative to raw sockets. If that speculation is right then it will be tough to get close to raw socket performance, since so many of the above decisions are embedded in the API or protocol. Perhaps making each of these things a bit faster, though, would make a noticeable overall difference. Or maybe if we're lucky there are still a couple big "doh!" slownesses in there that can be killed off for a win. Havoc _______________________________________________ dbus mailing list dbus@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dbus