'Re: need suggestion on indexing big directory'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openldap-software
Subject:    Re: need suggestion on indexing big directory
From:       Howard Chu <hyc () symas ! com>
Date:       2004-06-06 7:34:52
Message-ID: 40C2C91C.8060807 () symas ! com
[Download RAW message or body]

Quanah Gibson-Mount wrote:
 >  Note, that in *repeated* tests
> I've done, it was always quicker to "slapcat" the entire database and 
> then "slapadd" it back in, than to run slapindex.  There was some work 
> done at one point to fix this problem, I don't recall if it made it into 
> 2.2 or not, IIRC there were some unintended side effects, and it was put 
> off for now.

Yes, there were a few different approaches made, none with any positive 
effect. Testing for the existence of an entry (so that you can avoid 
adding it redundantly) took as much execution time as just blindly 
adding it and catching the error code (when an entry already exists). 
It's clear that adding an item that already exists in BDB is not a 
no-op; in several cases the size of the underlying database changed even 
though the transaction was aborted. I would say this is possibly a BDB 
bug, but it's hard to trace the real reasons.

> Some comparisons on a 330,000 entry db:
> 
> running slapindex where I changed a single attribute to have "sub" as 
> well as "eq": > 26 hours

> running "slapcat" then "slapadd" for the same DB with memory cache: 
> approx. 2 hours

> running "slapcat" then "slapadd" for the same DB with disk cache: 
> approx. 6 hours

There's also the fact that slapindex places a doubled demand on the BDB 
cache - it requires the entry information in the database to be loaded, 
crunched into index data, and then written out to the index databases. 
In the slapadd case, the entry information is read as plain text, so the 
  demand on the BDB cache is much less. It can all be used for deferred 
writes, whereas for slapindex it must serve both reads and writes. Some 
of the overhead can be avoided if the entry information is mapped in 
directly from its database files, instead of being copied into the BDB 
cache. However, BDB generally will not memory-map files over a certain 
size, and it won't do it at all if the file has already been used with 
the main cache. So to take advantage of memory mapping, first you would 
need to raise the size limit (in DB_CONFIG) to accomodate your id2entry 
database, and then you would need to run db_recover to flush out all the 
current id2entry pages, before running slapindex.
-- 
   -- Howard Chu
   Chief Architect, Symas Corp.       Director, Highland Sun
   http://www.symas.com               http://highlandsun.com/hyc
   Symas: Premier OpenSource Development and Support
[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic