'Re: [sqlite-dev] w/w concurrency (was sqlite 4 LSM storage)'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       sqlite-dev
Subject:    Re: [sqlite-dev] w/w concurrency (was sqlite 4 LSM storage)
From:       Paul Ruizendaal <pnr () planet ! nl>
Date:       2012-07-13 12:14:06
Message-ID: 660B4292-2629-4522-AF52-8F77E701FFAF () planet ! nl
[Download RAW message or body]


On 12 Jul 2012, at 3:16 , Howard Chu wrote:

> Paul Ruizendaal wrote:
> > 
> > On 11 Jul 2012, at 18:44 , Jonathan S. Shapiro wrote:
> > 
> > > I have also been thinking about this. The problem with multiple writer
> concurrency is that it requires relatively fine-grain locking.
> 
> Yes. And a deadlock detector.

I'm not yet convinced of either assertion. Does it really require fine-grain locking \
and is a deadlock detector unavoidable?

At this point it is perhaps good to note that w/w concurrency (full "snapshot \
isolation") is full ACID, but is not always serializable, and it does not need to be.

Example of three transactions under w/w snapshot isolation:

TX0:
set a=0
set b=0
commit

TX1:
set a=a+1               TX2:
set c=a+b                set b=b+2
commit                    set d=a+b
                                commit

TX1 and TX2 do not conflict and the resulting database state is:
a=1
b=2
c=1
d=2

This outcome is impossible if TX1 occurred either before or after TX2.


> Yeah, "one committer at a time" sounds nice. I've been looking for ways to
> accomplish this in MDB as well, but ultimately I don't believe it's realistic.
Are you sure? If you keep track of the txn number of current read transactions, you \
could defer placing old pages onto the free list until all the txns referring to it \
commit or roll back. You also keep a txn number in each record's payload, it is the \
txn number of the txn that wrote the record. A write transaction acts as a read \
transaction but keeps its updates in a memory tree (the C* tree, or "write set"). \
Upon commit it acquires a global write lock and tests that each on disk element also \
in the write set has a lower or equal txn number, and if so it writes all elements as \
usual. If not, the txn fails and no update takes place (or perhaps is retried).

In fairness, one could view the record txn number as a record level lock, so it is a \
form of fine grained locking. Also, if there are foreign key constraints the front \
end should perform a dummy write on the foreign tuple to make sure that a parallel \
modification is detected (in effect, a dummy write acts if a write lock was in \
place).

A slightly more risky variant is not to keep a record txn number, but to store both a \
'before' and 'after' value in the write set. The check would then be that on commit \
time the record is still the same -- perhaps written many times in between, but now \
the same as it was when the transaction started. This could even be done as an even \
finer grained lock: field level locking.

> I have the feeling that writing the conflict detection handler will drive you
> batty; if you write a very simple-minded one it will lead to too many txn retries.
The very essence of SQLite is doing simple-minded, but highly usable things. Yes, I \
guess that if you have a lengthy write transaction that updates a "hot spot" it would \
fail all the time: the hotspot would nearly always have another (quick) write \
transaction committing before the lengthy one gets its work done. Such a query could \
perhaps take out a global write lock in order to complete (just as every write \
transaction is doing in current SQLite and MDB) -- this would be similar to BEGIN \
IMMEDIATE in nature: http://www.sqlite.org/lang_transaction.html

Paul


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic