[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openldap-devel
Subject:    libmdb freelist and overflow pages
From:       Howard Chu <hyc () symas ! com>
Date:       2012-11-07 21:34:34
Message-ID: 509AD3EA.6010802 () symas ! com
[Download RAW message or body]

Quanah Gibson-Mount wrote:
> --On Monday, November 05, 2012 10:10 AM -0800 Howard Chu <hyc@symas.com>
>> So the issue is how to find a contiguous run of pages large enough to
>> satisfy the overflow page, in the current freelist. This takes us into
>> the realm of malloc algorithms, first-fit/best-fit/..., etc.
>>
>> I think first we scan whatever freelist we have in memory, to see if a
>> suitable run of pages is already present.
>>
>> If not, and there are additional freelists still available:
>>     1) we could just merge all of them, and then search again
>> or
>>     2) merge one at a time, and search again
>>
>> Leaning toward #2, I suspect we don't need to coalesce all freelists all
>> the time.
>
> I like the sound of #2 as well.  If you come up with a patch, I can test. ;)
>
> --Quanah

Well, just like last LinuxCon, we've had some new input from this LinuxCon.
Theodore Ts'o (ext4 lead developer) raised the topic of Erase Blocks on 
flash-based storage devices. If we can ensure that our page allocations are 
aligned with the Erase Block size of the data store, we'll get higher write 
throughput on SSDs, MMC cards, etc. Erase Blocks are commonly 32KB or 64KB 
today, with 128KB coming soon.

So, we may want to think about chunking up our page allocations into 
power-of-two chunks. Perhaps as a separate environment flag setting.

Even if we don't explicitly try to form 64K chunks all the time, it may be 
best for us to fully coalesce all available free lists whenever a request for 
an overflow page arrives. So our default case of single-page requests will 
continue as before, overflow pages will have some chance of reusing old pages, 
and otherwise they'll just use new pages, as they currently do.

Interestingly, the Red Hat folks expressed a desire to adopt MDB in RPM, which 
currently uses BerkeleyDB. Ironically, they'd like an option to run in pure 
append-only mode, to allow rolling back to previous state if one package of a 
large upgrade fails, and the user decides to abandon the entire upgrade.

It would be simple enough to add an environment flag for append-only mode, 
which would skip all of the freelist management entirely. Not interested in 
looking at that yet, can address it later if/when movement happens on the RPM 
project.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic