[prev in list] [next in list] [prev in thread] [next in thread] 

List:       zope-dev
Subject:    [Zope-dev] Storm/ZEO deadlocks (was Re: [ZODB-Dev] [announce] NEO 1.0 - scalable and redundant stora
From:       Marius Gedminas <marius () gedmin ! as>
Date:       2012-08-30 16:14:49
Message-ID: 20120830161449.GA21522 () platonas
[Download RAW message or body]

[Attachment #2 (multipart/signed)]


On Wed, Aug 29, 2012 at 06:30:50AM -0400, Jim Fulton wrote:
> On Wed, Aug 29, 2012 at 2:29 AM, Marius Gedminas <marius@gedmin.as> wrote:
> > On Tue, Aug 28, 2012 at 06:31:05PM +0200, Vincent Pelletier wrote:
> >> On Tue, 28 Aug 2012 16:31:20 +0200,
> >> Martijn Pieters <mj@zopatista.com> wrote :
> >> > Anything else different? Did you make any performance comparisons
> >> > between RelStorage and NEO?
> >>
> >> I believe the main difference compared to all other ZODB Storage
> >> implementation is the finer-grained locking scheme: in all storage
> >> implementations I know, there is a database-level lock during the
> >> entire second phase of 2PC, whereas in NEO transactions are serialised
> >> only when they alter a common set of objects.
> >
> > This could be a compelling point.  I've seen deadlocks in an app that
> > tried to use both ZEO and PostgreSQL via the Storm ORM.  (The thread
> > holding the ZEO commit lock was blocked waiting for the PostgreSQL
> > commit to finish, while the PostgreSQL server was waiting for some other
> > transaction to either commit or abort -- and that other transaction
> > couldn't proceed because it was waiting for the ZEO lock.)
> 
> This sounds like an application/transaction configuration problem.

*shrug*

Here's the code to reproduce it: http://pastie.org/4617132

> To avoid this sort of deadlock, you need to always commit in a
> a consistent order.  You also need to configure ZEO (or NEO)
> to time-out transactions that take too long to finish the second phase.

The deadlock happens in tpc_begin() in both threads, which is the first
phase, AFAIU.

AFAICS Thread #2 first performs tpc_begin() for ClientStorage and takes
the ZEO commit lock.  Then it enters tpc_begin() for Storm's
StoreDataManager and blocks waiting for a response from PostgreSQL --
which is delayed because the PostgreSQL server is waiting to see if
the other thread, Thread #1, will commit or abort _its_ transaction, which
is conflicting with the one from Thread #2.

Meanwhile Thread #1 is blocked in ZODB's tpc_begin(), trying to acquire the
ZEO commit lock held by Thread #2.

I'm too fried right now to understand who's at fault here.

Workarounds probably exist (use RelStorage instead of ZEO?  Configure
Storm to use a lower PostgreSQL transaction isolation level?).  Maybe
this problem would go away if Storm always went into tpc_begin() before
ZEO.

I've pinged the people in #storm on FreeNode about this, but haven't
filed any bugs yet.

Marius Gedminas
-- 
Q: Wanting both frequent updates and stability/support is just wishing for a
   pony!
A: Well, we're riding our ponies to the tune of several billion page views per
   month. Where's your pony? Oh, you didn't get one?
                -- http://meta.wikimedia.org/wiki/Wikimedia_Ubuntu_migration_FAQ

["signature.asc" (application/pgp-signature)]

_______________________________________________
Zope-Dev maillist  -  Zope-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists -
 https://mail.zope.org/mailman/listinfo/zope-announce
 https://mail.zope.org/mailman/listinfo/zope )


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic