[prev in list] [next in list] [prev in thread] [next in thread]
List: gdal-dev
Subject: [gdal-dev] Performance and sibling files
From: even.rouault () mines-paris ! org (Even Rouault)
Date: 2008-01-29 14:30:27
Message-ID: 200801292030.22952.even.rouault () mines-paris ! org
[Download RAW message or body]
Daniel,
I have added your message in ticket #2158 and added you in CC of the ticket,
so you can follow how it evolves.
Best regards,
Even
Le Tuesday 29 January 2008 16:19:34 Daniel, vous avez ?crit?:
> Hello,
>
> We have identfied a serious performance problem with the reading of sibling
> files performed in the GDALOpenInfo constructor.
>
> When we commented out lines 123-127 in gdalopeninfo.cpp (the VSIReadDir
> call), the runtime of our application went down from 150 days to 15! The
> application is 100% i/O-bound (uses no cpu time according to the task
> manager)
>
> This is our setup:
>
> 7.5 million small (~20 KB) jpeg files with corresponding world files for a
> total of 15 million files, distributed in 50000 directories (approximately
> 300 files per directory).
>
> The files reside on a fast 15K SAS disk running in a Windows 2003 server
> with 8 cores and 4 GB RAM. The filesystem is NTFS (no compression /
> indexing).
>
> Due to the way the files are organized, neighboring jpeg files are located
> in different directories. This means that we always have to read the entire
> directory in order to open just one file.
>
> Our app needs to go read the entire dataset ordered geographically.
> Unfortunately, changing the directory layout is not an option.
>
> Reading one complete directory means reading ~1.5 MB data from disk. The
> data is read non-sequentially, since the NTFS directory structure is a
> B-Tree and FindNextFile returns the contents sorted alphabetically.
> The disk cache gets exhausted after reading 2700 directories. This means
> that we neve re-use the previously read directory data.
>
> I realise that this might be a quite unusual case but it would be very nice
> if the sibling reading in GDALOpenInfo was optional.
>
> I don't think that the changes made in ticket #2158 (
> http://trac.osgeo.org/gdal/ticket/2158) would help in this case since there
> was almost no CPU utilization.
>
> Regards,
> Daniel B?ck
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic