[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgresql-general
Subject:    [GENERAL] Corrupt Incrementally Updated Backup: missing pg_clog file
From:       Jürgen_Fuchsberger <juergen.fuchsberger () uni-graz ! at>
Date:       2012-10-31 14:24:27
Message-ID: 5091349B.6000407 () uni-graz ! at
[Download RAW message or body]

Hi all,

I have a problem with a corrupt backup, fortunately I was only testing 
so I did not loose any data. Unfortunetely what I did is to follow the 
backup guidelines in the documentation, which I thought should work 
reliably. Here are the details:

I am running a postgreSQL 8.4 database on a Debian Squeeze system. For 
Backups I am using the warm standby and "Incrementally Updated Backup" 
method as described in chapter 24.4 of the documentation. So my Setup is 
as follows:

Server 1 (Main): PostgreSQL 8.4 Database with archive_mode enabled 
shipping WAL files to a NFS drive. Size of database is about 370 GB and 
growing.

Server 2 (Replica): PostgreSQL 8.4 Database in recovery mode. Using 
pg_standby in recovery.conf and getting WAL files from Server 1 NFS drive.

All this works fine and runs without errors.

The replica is backed up once a week using rsync, a full backup runs 
about 10 hours, so I also keep at least 24h of WAL files to make sure I 
have a consistent backup.

The backup process also runs fine without errors, only the time (10h) it 
takes is quite long, so I decided to test the backup:

1) Restored the full backup to a test directory
(var/lib/postgresql/8.4/test)

2) Copied the configuration of the main server to
/etc/postgresql/8.4/test/
Altered port number, paths and turned off archive mode in postgresql.conf.

3) Added a recovery.conf in the test servers data dir. Recovering from 
my backed up WAL files:
restore_command = 'cp /var/postgresql-wal-test/%f "%p"'

4) Started the test server (pg_ctlcluster 8.4 test start)

5) Waited until recovery was done (everything worked fine until then)
2012-09-25 08:26:41 UTC LOG:  database system is ready to accept connections
2012-09-25 08:26:41 UTC LOG:  autovacuum launcher started

6) Connected via psql to the database and tried a \d to see my tables 
which did *not* work!
Here is the output:
2012-09-25 08:27:03 UTC ERROR:  could not access status of transaction 
500185903
2012-09-25 08:27:03 UTC DETAIL:  Could not open file "pg_clog/01DD": No 
such file or directory.

Also trying to SELECT data from the database tables failed with the same 
error.
The backup is corrupt. So my question is, what went wrong:
Obviously as the rsync started it copied everything from the pg_clog 
(which at this point was until pg_clog/01DC) and then went on for 
another 10+ hours backing up all the rest of the database. At the time 
the backup ended, the database content changed but the newer clog files 
did not go into the backup.
When restoring the backup and starting the server, the recovery process 
started at a point where pg_clog was at state 01DE or even further and 
thus the data from 01DD was missing.

So what I do from now, is an extra daily backup of my clog directory to 
make sure to have working backups. This is not documented in the 
postgreSQL documentation, and since the result in not doing so can be 
quite severe I think you should consider this in future PostgreSQL 
documentation versions.

Regards,
Juergen

Additional information:

The EXACT PostgreSQL version you are running:
PostgreSQL 8.4.13 on i486-pc-linux-gnu, compiled by GCC gcc-4.4.real 
(Debian 4.4.5-8) 4.4.5, 32-bit

How you installed PostgreSQL

 From Linux distro package management: Debian/Aptitude
If so, what repository?
deb http://ftp.at.debian.org/debian/ squeeze main non-free contrib
deb-src http://ftp.at.debian.org/debian/ squeeze main non-free contrib
deb http://security.debian.org/ squeeze/updates main contrib non-free
deb-src http://security.debian.org/ squeeze/updates main contrib non-free
deb http://ftp.debian.org/debian squeeze-updates main contrib non-free
deb-src http://ftp.debian.org/debian squeeze-updates main contrib non-free

Changes made to the settings in the postgresql.conf file:
name                  |  current_setting
----------------------+-------------------------------------------------
  version              | PostgreSQL 8.4.13 on i486-pc-linux-gnu, 
compiled by GCC gcc-4.4.real (Debian 4.4.5-8) 4.4.5, 32-bit
  archive_command      | cp -i %p /var/postgres-wal/%f </dev/null && cp 
-i %p /var/postgres-wal/bak/%f </dev/null && gzip /var/postgres-wal/bak/%f
  archive_mode         | on
  archive_timeout      | 0
  client_encoding      | utf8
  effective_cache_size | 1000MB
  lc_collate           | en_US.UTF-8
  lc_ctype             | en_US.UTF-8
  listen_addresses     | *
  log_line_prefix      | %t
  maintenance_work_mem | 256MB
  max_connections      | 100
  max_stack_depth      | 2MB
  password_encryption  | on
  port                 | 5432
  server_encoding      | UTF8
  shared_buffers       | 650MB
  ssl                  | on
  synchronous_commit   | off
  TimeZone             | UTC
  work_mem             | 40MB


Operating system and version
             Linux distro and version:
Debian 6.0.6 (squeeze)
             Kernel details:
Linux wegc203094 2.6.32-5-686 #1 SMP Sun May 6 04:01:19 UTC 2012 i686 
GNU/Linux

What program you're using to connect to PostgreSQL:
psql and phpPgAdmin

-- 
| Juergen Fuchsberger
| Wegener Center for Climate and Global Change
| Karl-Franzens-University Graz
| Leechgasse 25, A-8010 Graz
| phone: +43-316-380-8438
|   fax: +43-316-380-9830
| eMail: juergen.fuchsberger@uni-graz.at
|   web: www.wegcenter.at


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic