[prev in list] [next in list] [prev in thread] [next in thread]
List: bacula-users
Subject: [Bacula-users] Improving the speed of spooling attributes
From: Dan Langille <dan () langille ! org>
Date: 2014-03-23 19:33:28
Message-ID: 9052464A-7B2D-4BC5-BA37-2D66E7508ED8 () langille ! org
[Download RAW message or body]
[Attachment #2 (multipart/signed)]
In this email, I write about backup times growing over a few months, and trying to \
figure out why it was so slow.
Conclusion: give your database server as much RAM as you can. Inserting into the \
File table requires updating 5 indexes. If that index can be held entirely in RAM, \
those updates can occur without constant swapping to disk. The amount of RAM you \
need to give it varies according to your database size. Too much or too little can \
increase the time required.
Ref: Bacula 5.2.12 on FreeBSD 9.2, backing up to disk first, then copying to tape. \
Disk storage is raidz2 (more later in post).
The problem: slow backups. Not slow as in time to backup data, but slow in putting \
the attributes into the database.
In this post, when I speak about time, I am referring to the time it takes to spool \
the data attributes. Taking this sample job output:
###
23-Mar 05:02 crey-sd JobId 167020: Sending spooled attrs to the Director. Despooling \
115,264,052 bytes ... 23-Mar 05:09 bacula-dir JobId 167020: Bacula bacula-dir 5.2.12 \
(12Sep12): ###
In that example, spooling time is 7 minutes (roughly speaking).
Given that different size backups result in different amounts of data spooling, I \
took to measuring the spooling process in MB/s. From a high of 129 MB/s in early \
January, it dropped to 73 by the end of January, and by mid Feb it was 5MB/s.
I suspected the file system, etc, but I was proven wrong. It turned out to be a \
database issue.
First, some fact:
* The File table contains about 172 million records. This size ballooned over this \
period because of increased backups.
* Logging was not being monitored on the database server
* Localhost connections were blocked by the firewall, thus preventing the auto-vaccum \
process from being initiated
The first problem to solve was dead tuples in the File table. Firewall rules were \
altered to allow auto-vaccum to run.
Various database tuning parameters were changed to get an initial vacuum to run in \
decent time:
* RAM on this PostgreSQL 9.2.4 server is 16GB
* work_mem = 1GB
* maintenance_work_mem = 1GB
* checkpoint_segments = 512
* checkpoint_completion_target = 0.7
Once an autovacuum was done, things improved. It now took about 45 minutes, giving \
us 40MB/s for spooling attributes. I figured we must be able to do better.
I started playing with SQL by creating my own database table to mirror the temporary \
table, ‘batch’. Then I started running the insert query to see what optimizations I \
could make. e.g. I ran this query manually:
INSERT INTO File (FileIndex, 1, PathId, FilenameId, LStat, MD5, DeltaSeq)
SELECT B.FileIndex, B.JobId, P.PathId, FN.FilenameId, B.LStat, B.MD5, B.DeltaSeq
FROM my_batch B
JOIN Path P ON (B.Path = P.Path)
JOIN Filename FN ON (B.Name = FN.Name);
I always inserted into Jobid = 1 which I knew was not a job still in history.
More details here: https://docs.google.com/document/d/1AVAIi6PmJZZE11N3PLLNtbuxuOES4vCNtXiqoxoP2Xk/edit
I found that these settings helped. They are standard PostgreSQL settings to \
optimize queries.
shared_buffers = 3GB (postgresql.conf setting)
kern.ipc.shmmax=4294967296 (/etc/sysctl.conf)
kern.ipc.shmall=4294967296
This dropped the insert time to about 6 minutes. About half of this time is \
constructing the query
NOTE: Using 2.5GB or 3.5GB decreased the throughput.
Filesystem background:
This is where the backups are stored on disk (i.e. bacula-sd on server B):
$ zfs get all system/usr/local/bacula
NAME PROPERTY VALUE \
SOURCE system/usr/local/bacula type filesystem \
- system/usr/local/bacula creation Mon Jul 22 10:25 2013 \
- system/usr/local/bacula used 12.9T \
- system/usr/local/bacula available 4.32T \
- system/usr/local/bacula referenced 8.96T \
- system/usr/local/bacula compressratio 1.25x \
- system/usr/local/bacula mounted yes \
- system/usr/local/bacula quota none \
default system/usr/local/bacula reservation none \
default system/usr/local/bacula recordsize 128K \
default system/usr/local/bacula mountpoint \
/usr/jails/crey.example.org/usr/local/bacula local system/usr/local/bacula sharenfs \
off default system/usr/local/bacula \
checksum fletcher4 inherited from \
system system/usr/local/bacula compression lz4 \
local system/usr/local/bacula atime off \
inherited from system system/usr/local/bacula devices on \
default system/usr/local/bacula exec on \
default system/usr/local/bacula setuid on \
inherited from system/usr/local system/usr/local/bacula readonly off \
local system/usr/local/bacula jailed off \
default system/usr/local/bacula snapdir hidden \
default system/usr/local/bacula aclmode discard \
default system/usr/local/bacula aclinherit restricted \
default system/usr/local/bacula canmount on \
default system/usr/local/bacula xattr off \
temporary system/usr/local/bacula copies 1 \
default system/usr/local/bacula version 5 \
- system/usr/local/bacula utf8only off \
- system/usr/local/bacula normalization none \
- system/usr/local/bacula casesensitivity sensitive \
- system/usr/local/bacula vscan off \
default system/usr/local/bacula nbmand off \
default system/usr/local/bacula sharesmb off \
default system/usr/local/bacula refquota none \
default system/usr/local/bacula refreservation none \
default system/usr/local/bacula primarycache all \
default system/usr/local/bacula secondarycache all \
default system/usr/local/bacula usedbysnapshots 3.97T \
- system/usr/local/bacula usedbydataset 8.96T \
- system/usr/local/bacula usedbychildren 0 \
- system/usr/local/bacula usedbyrefreservation 0 \
- system/usr/local/bacula logbias latency \
default system/usr/local/bacula dedup off \
default system/usr/local/bacula mlslabel \
- system/usr/local/bacula sync standard \
default system/usr/local/bacula refcompressratio 1.30x \
- system/usr/local/bacula written 19.1G \
- system/usr/local/bacula logicalused 15.9T \
- system/usr/local/bacula logicalreferenced 11.4T \
-
The database is stored here (on server B):
$ zfs get all system/usr/local/pgsql
NAME PROPERTY VALUE SOURCE
system/usr/local/pgsql type filesystem -
system/usr/local/pgsql creation Fri May 3 9:38 2013 -
system/usr/local/pgsql used 193G -
system/usr/local/pgsql available 9.75T -
system/usr/local/pgsql referenced 193G -
system/usr/local/pgsql compressratio 2.10x -
system/usr/local/pgsql mounted yes -
system/usr/local/pgsql quota none default
system/usr/local/pgsql reservation none default
system/usr/local/pgsql recordsize 8K local
system/usr/local/pgsql mountpoint /usr/local/pgsql inherited from \
system system/usr/local/pgsql sharenfs off default
system/usr/local/pgsql checksum fletcher4 inherited from \
system system/usr/local/pgsql compression lz4 local
system/usr/local/pgsql atime off inherited from \
system system/usr/local/pgsql devices on default
system/usr/local/pgsql exec on default
system/usr/local/pgsql setuid on inherited from \
system/usr/local system/usr/local/pgsql readonly off \
default system/usr/local/pgsql jailed off default
system/usr/local/pgsql snapdir hidden default
system/usr/local/pgsql aclmode discard default
system/usr/local/pgsql aclinherit restricted default
system/usr/local/pgsql canmount on local
system/usr/local/pgsql xattr off temporary
system/usr/local/pgsql copies 1 default
system/usr/local/pgsql version 5 -
system/usr/local/pgsql utf8only off -
system/usr/local/pgsql normalization none -
system/usr/local/pgsql casesensitivity sensitive -
system/usr/local/pgsql vscan off default
system/usr/local/pgsql nbmand off default
system/usr/local/pgsql sharesmb off default
system/usr/local/pgsql refquota none default
system/usr/local/pgsql refreservation none default
system/usr/local/pgsql primarycache metadata local
system/usr/local/pgsql secondarycache all default
system/usr/local/pgsql usedbysnapshots 0 -
system/usr/local/pgsql usedbydataset 193G -
system/usr/local/pgsql usedbychildren 0 -
system/usr/local/pgsql usedbyrefreservation 0 -
system/usr/local/pgsql logbias latency default
system/usr/local/pgsql dedup off default
system/usr/local/pgsql mlslabel -
system/usr/local/pgsql sync standard default
system/usr/local/pgsql refcompressratio 2.10x -
system/usr/local/pgsql written 193G -
system/usr/local/pgsql logicalused 137G -
system/usr/local/pgsql logicalreferenced 137G -
--
Dan Langille - http://langille.org
["signature.asc" (signature.asc)]
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org
iEYEARECAAYFAlMvNwgACgkQCgsXFM/7nTwMbgCfQrUyZBKT8hwCt9I+hcwjOgV7
iLIAni4Xklw7oVZc4CbwGSrY8mh0vnWx
=bDO/
-----END PGP SIGNATURE-----
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic