[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgresql-admin
Subject:    [ADMIN] warm standby, pg_standby, invalid checkpoint record
From:       Brad Wiemerslage <wiemersl () yahoo ! com>
Date:       2009-02-27 7:47:44
Message-ID: 818050.20576.qm () web56001 ! mail ! re3 ! yahoo ! com
[Download RAW message or body]


I'm attempting to get warm standby up and running with a pair of servers running \
ubuntu 8.04 and postgresql 8.3.  Been following the docs:

http://www.postgresql.org/docs/8.3/static/warm-standby.html
http://www.postgresql.org/docs/current/static/pgstandby.html

Also, basically following the ideas here in this blog post:

http://scale-out-blog.blogspot.com/2009/02/simple-ha-with-postgresql-point-in-time.html


I've customized the original script he refers to in the article, which is here in its \
entirety for reference:

https://s3.amazonaws.com/extras.continuent.com/standby.sh

Here is the meat of my customized script, which runs on the standby.  The postgresql \
server on the standby is stopped first. 

start_backup="SELECT pg_start_backup('my_backup');"
stop_backup="SELECT pg_stop_backup();"
echo "$start_backup" | $psql -h$PRIMARY -U myuser -d mydb -e
rsync --delete -avz -e "ssh -i /path/to/key" myuser@$PRIMARY:$PG_DATA/ $PG_DATA
echo "$stop_backup" | $psql -h $PRIMARY -U myuser -d mydb -e

The files seem to copied over to the standby machine just fine.  Success is reported \
with respect to the backup commands.  Permissions seem fine.  

Next, there are some steps which blow out some files.  As I understand it, you no \
longer need the files on the standby that were in pg_xlog on the primary. 

rm -f $PG_DATA/recovery.*
rm -f $PG_DATA/8.3/main/logfile
rm -f $PG_DATA/8.3/main/postmaster.pid
rm -f $PG_DATA/8.3/main/pg_xlog/0*
rm -f $PG_DATA/8.3/main/pg_xlog/archive_status/0*

This step seems to work fine. 

Then, the archives are pulled.  They are pulled to /mnt/postgresql_archives with this \
command:

rsync --delete -avz -e "ssh -i /path/to/key" myuser@$PRIMARY:$PG_ARCHIVES/ \
$PG_ARCHIVES

Everything looks good.  I end up with an up to date list of WAL files in \
/mnt/postgresql_archives on the standby.  Here is a listing:

root@standby:/mnt/postgresql_archives# ls
total 688996
drwxr-xr-x  2 postgres postgres     4096 2009-02-27 01:13 .
drwxr-xr-x 14 root     root         4096 2009-02-27 01:17 ..
-rw-rw----  1 postgres postgres 16777216 2009-02-27 00:19 0000000100000000000000CB
-rw-rw----  1 postgres postgres 16777216 2009-02-27 00:29 0000000100000000000000CC
-rw-rw----  1 postgres postgres 16777216 2009-02-27 00:38 0000000100000000000000CD
-rw-rw----  1 postgres postgres      245 2009-02-27 00:38 \
                0000000100000000000000CD.00000020.backup
-rw-rw----  1 postgres postgres 16777216 2009-02-27 00:48 0000000100000000000000CE
-rw-rw----  1 postgres postgres 16777216 2009-02-27 00:54 0000000100000000000000CF
-rw-rw----  1 postgres postgres 16777216 2009-02-27 00:58 0000000100000000000000D0
-rw-rw----  1 postgres postgres 16777216 2009-02-27 01:01 0000000100000000000000D1
-rw-rw----  1 postgres postgres 16777216 2009-02-27 01:03 0000000100000000000000D2
-rw-rw----  1 postgres postgres 16777216 2009-02-27 01:13 0000000100000000000000D3

Then, the recovery.conf is put in place.  I've tried two different versions, which \
end up giving me the same error.  Here are the two different versions.  

#1: restore_command = '/usr/lib/postgresql/8.3/bin/pg_standby -c -d -s 2 -t \
/mnt/postgresql_archives/pgsql.trigger /mnt/postgresql_archives %f %p >> \
/mnt/postgresql_archives/standby.log 1>&2'

#2: restore_command = 'cp /mnt/server/archivedir/%f "%p"'

I don't believe that #2 is suitable for warm standby, but just tried it to debug \
after #1 wouldn't work.  Now, I try to start up the server.  For it to work in \
standby mode, additional archive files will be pulled from the primary machine on a \
periodic basis.  I'm using this command, which deletes them on the primary when they \
are no longer necessary.  It also seems to work fine.    

rsync -avz -e "ssh -i /path/to/key" myuser@$PRIMARY:$PG_ARCHIVES/ $PG_ARCHIVES

I guess I'm a little confused about exactly what is happening here when the server \
comes up, but here is the error message I'm getting.  It seems to be looking for the \
files in pg_pxlog, which is cleared out.  So, the error makes sense.  But isn't it \
supposed to be looking in /mnt/postgresql_archives per the restore_command(s)?  The \
files are available there.   

2009-02-27 01:26:52.867 EST,,,7422,,49a787ac.1cfe,2,,2009-02-27 01:26:52 \
EST,,0,LOG,58P01,"could not open file ""pg_xlog/0000000100000000000000CD"" (log file \
0, segment 20 5): No such file or directory",,,,,,,,
2009-02-27 01:26:52.867 EST,,,7422,,49a787ac.1cfe,3,,2009-02-27 01:26:52 \
EST,,0,LOG,00000,"invalid checkpoint record",,,,,,,, 2009-02-27 01:26:52.867 \
EST,,,7422,,49a787ac.1cfe,4,,2009-02-27 01:26:52 EST,,0,PANIC,XX000,"could not locate \
required checkpoint record",,"If you are not restoring from a  backup, try removing \
the file ""/var/lib/postgresql/8.3/main/backup_label"".",,,,,, 2009-02-27 \
01:26:52.868 EST,,,7419,,49a787ab.1cfb,1,,2009-02-27 01:26:51 \
EST,,0,LOG,00000,"startup process (PID 7422) was terminated by signal 6: \
Aborted",,,,,,,, 2009-02-27 01:26:52.868 EST,,,7419,,49a787ab.1cfb,2,,2009-02-27 \
01:26:51 EST,,0,LOG,00000,"aborting startup due to startup process failure",,,,,,,,

So, I tried copying the files in the /mnt/postgresql_archives over to pg_xlog.  This \
seemed to work, and the updates were applied.  Never at any point did I get a \
recovery.done file.  Also, for whatever reason, I was never able to get any debug \
info from pg_standby in standby.log.

Anyhow, I've burned up a couple days trying to figure this out.  Any help would be \
much appreciated.  

Thanks,
Brad




-- 
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic