[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lucene-dev
Subject:    [REPOST] [Benchmarks] Daniel's numbers
From:       Kelvin Tan <kelvin-lists () relevanz ! com>
Date:       2002-12-11 11:25:53
[Download RAW message or body]

Please see attached for diff to benchmarks.xml for Daniel's=
 numbers.
Thanks Dan!

Regards,
Kelvin

--------
The book giving manifesto     - http://how.to/sharethisbook




["diff.txt" (text/plain)]

cvs -z9 diff benchmarks.xml (in directory C:\checkout\jakarta-lucene\xdocs\)
Index: benchmarks.xml
===================================================================
RCS file: /home/cvspublic/jakarta-lucene/xdocs/benchmarks.xml,v
retrieving revision 1.1
diff -r1.1 benchmarks.xml
278a279,344
> <subsection name="Daniel Armbrust's benchmarks">
> <p>
> My disclaimer is that this is a very poor "Benchmark".  It was not done for raw \
> speed,  nor was the total index built in one shot.  The index was created on \
> several different  machines (all with these specs, or very similar), with each \
> machine indexing batches of 500,000 to  1 million documents per batch.  Each of \
> these small indexes was then moved to a  much larger drive, where they were all \
> merged together into a big index.   This process was done manually, over the course \
> of several months, as the sources became available. </p>
> <ul>
> <p>
> <b>Hardware Environment</b><br/>
> <li><i>Dedicated machine for indexing</i>: no - The machine had moderate to low \
> load.  However, the indexing process was built single  threaded, so it only took \
> advantage of 1 of the processors.  It usually got 100% of this processor.</li> \
> <li><i>CPU</i>: Sun Ultra 80 4 x 64 bit processors</li> <li><i>RAM</i>: 4 GB \
> Memory</li> <li><i>Drive configuration</i>: Ultra-SCSI Wide 10000 RPM 36GB \
> Drive</li> </p>
> <p>
> <b>Software environment</b><br/>
> <li><i>Java Version</i>: 1.3.1</li>
> <li><i>Java VM</i>: </li>
> <li><i>OS Version</i>: Sun 5.8 (64 bit)</li>
> <li><i>Location of index</i>: local</li>
> </p>
> <p>
> <b>Lucene indexing variables</b><br/>
> <li><i>Number of source documents</i>: 13,820,517</li>
> <li><i>Total filesize of source documents</i>: 87.3 GB</li>
> <li><i>Average filesize of source documents</i>: 6.3 KB</li>
> <li><i>Source documents storage location</i>: Filesystem</li>
> <li><i>File type of source documents</i>: XML</li>
> <li><i>Parser(s) used, if any</i>: </li>
> <li><i>Analyzer(s) used</i>: A home grown analyzer that simply removes \
> stopwords.</li> <li><i>Number of fields per document</i>: 1 - 31</li>
> <li><i>Type of fields</i>: All text, though 2 of them are dates (20001205) that we \
> filter on</li> <li><i>Index persistence</i>: FSDirectory</li>
> <li><i>Index size</i>: 12.5 GB</li>
> </p>
> <p>
> <b>Figures</b><br/>
> <li><i>Time taken (in ms/s as an average of at least 3 
> indexing runs)</i>: For 617271 documents, 209698 seconds (or ~2.5 days)</li>
> <li><i>Time taken / 1000 docs indexed</i>: 340 Seconds</li>
> <li><i>Memory consumption</i>: (java executed with) java -Xmx1000m -Xss8192k so 
> 1 GB of memory was allotted to the indexer</li>
> </p>
> <p>
> <b>Notes</b><br/>
> <li><i>Notes</i>: 
> <p>
> The source documents were XML.  The "indexer" opened each document one at a time, \
> ran an  XSL transformation on them, and then proceeded to index the stream.  The \
> indexer optimized  the index every 50,000 documents (on this run) though \
> previously, we optimized every  300,000 documents.  The performance didn't change \
> much either way.  We did no other  tuning (RAM Directories, separate process to \
> pretransform the source material, etc)  to make it index faster.  When all of these \
> individual indexes were built, they were  merged together into the main index.  \
> That process usually took ~ a day. </p></li>
> </p>
> </ul>
> <p>
> Daniel can be contacted at Armbrust.Daniel at mayo.edu.
> </p>
> </subsection> 
> 


["att120.txt" (text/plain)]

--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic