[prev in list] [next in list] [prev in thread] [next in thread]
List: postgresql-admin
Subject: [ADMIN] warm standby, pg_standby, invalid checkpoint record
From: Brad Wiemerslage <wiemersl () yahoo ! com>
Date: 2009-02-27 7:47:44
Message-ID: 818050.20576.qm () web56001 ! mail ! re3 ! yahoo ! com
[Download RAW message or body]
I'm attempting to get warm standby up and running with a pair of servers running \
ubuntu 8.04 and postgresql 8.3. Been following the docs:
http://www.postgresql.org/docs/8.3/static/warm-standby.html
http://www.postgresql.org/docs/current/static/pgstandby.html
Also, basically following the ideas here in this blog post:
http://scale-out-blog.blogspot.com/2009/02/simple-ha-with-postgresql-point-in-time.html
I've customized the original script he refers to in the article, which is here in its \
entirety for reference:
https://s3.amazonaws.com/extras.continuent.com/standby.sh
Here is the meat of my customized script, which runs on the standby. The postgresql \
server on the standby is stopped first.
start_backup="SELECT pg_start_backup('my_backup');"
stop_backup="SELECT pg_stop_backup();"
echo "$start_backup" | $psql -h$PRIMARY -U myuser -d mydb -e
rsync --delete -avz -e "ssh -i /path/to/key" myuser@$PRIMARY:$PG_DATA/ $PG_DATA
echo "$stop_backup" | $psql -h $PRIMARY -U myuser -d mydb -e
The files seem to copied over to the standby machine just fine. Success is reported \
with respect to the backup commands. Permissions seem fine.
Next, there are some steps which blow out some files. As I understand it, you no \
longer need the files on the standby that were in pg_xlog on the primary.
rm -f $PG_DATA/recovery.*
rm -f $PG_DATA/8.3/main/logfile
rm -f $PG_DATA/8.3/main/postmaster.pid
rm -f $PG_DATA/8.3/main/pg_xlog/0*
rm -f $PG_DATA/8.3/main/pg_xlog/archive_status/0*
This step seems to work fine.
Then, the archives are pulled. They are pulled to /mnt/postgresql_archives with this \
command:
rsync --delete -avz -e "ssh -i /path/to/key" myuser@$PRIMARY:$PG_ARCHIVES/ \
$PG_ARCHIVES
Everything looks good. I end up with an up to date list of WAL files in \
/mnt/postgresql_archives on the standby. Here is a listing:
root@standby:/mnt/postgresql_archives# ls
total 688996
drwxr-xr-x 2 postgres postgres 4096 2009-02-27 01:13 .
drwxr-xr-x 14 root root 4096 2009-02-27 01:17 ..
-rw-rw---- 1 postgres postgres 16777216 2009-02-27 00:19 0000000100000000000000CB
-rw-rw---- 1 postgres postgres 16777216 2009-02-27 00:29 0000000100000000000000CC
-rw-rw---- 1 postgres postgres 16777216 2009-02-27 00:38 0000000100000000000000CD
-rw-rw---- 1 postgres postgres 245 2009-02-27 00:38 \
0000000100000000000000CD.00000020.backup
-rw-rw---- 1 postgres postgres 16777216 2009-02-27 00:48 0000000100000000000000CE
-rw-rw---- 1 postgres postgres 16777216 2009-02-27 00:54 0000000100000000000000CF
-rw-rw---- 1 postgres postgres 16777216 2009-02-27 00:58 0000000100000000000000D0
-rw-rw---- 1 postgres postgres 16777216 2009-02-27 01:01 0000000100000000000000D1
-rw-rw---- 1 postgres postgres 16777216 2009-02-27 01:03 0000000100000000000000D2
-rw-rw---- 1 postgres postgres 16777216 2009-02-27 01:13 0000000100000000000000D3
Then, the recovery.conf is put in place. I've tried two different versions, which \
end up giving me the same error. Here are the two different versions.
#1: restore_command = '/usr/lib/postgresql/8.3/bin/pg_standby -c -d -s 2 -t \
/mnt/postgresql_archives/pgsql.trigger /mnt/postgresql_archives %f %p >> \
/mnt/postgresql_archives/standby.log 1>&2'
#2: restore_command = 'cp /mnt/server/archivedir/%f "%p"'
I don't believe that #2 is suitable for warm standby, but just tried it to debug \
after #1 wouldn't work. Now, I try to start up the server. For it to work in \
standby mode, additional archive files will be pulled from the primary machine on a \
periodic basis. I'm using this command, which deletes them on the primary when they \
are no longer necessary. It also seems to work fine.
rsync -avz -e "ssh -i /path/to/key" myuser@$PRIMARY:$PG_ARCHIVES/ $PG_ARCHIVES
I guess I'm a little confused about exactly what is happening here when the server \
comes up, but here is the error message I'm getting. It seems to be looking for the \
files in pg_pxlog, which is cleared out. So, the error makes sense. But isn't it \
supposed to be looking in /mnt/postgresql_archives per the restore_command(s)? The \
files are available there.
2009-02-27 01:26:52.867 EST,,,7422,,49a787ac.1cfe,2,,2009-02-27 01:26:52 \
EST,,0,LOG,58P01,"could not open file ""pg_xlog/0000000100000000000000CD"" (log file \
0, segment 20 5): No such file or directory",,,,,,,,
2009-02-27 01:26:52.867 EST,,,7422,,49a787ac.1cfe,3,,2009-02-27 01:26:52 \
EST,,0,LOG,00000,"invalid checkpoint record",,,,,,,, 2009-02-27 01:26:52.867 \
EST,,,7422,,49a787ac.1cfe,4,,2009-02-27 01:26:52 EST,,0,PANIC,XX000,"could not locate \
required checkpoint record",,"If you are not restoring from a backup, try removing \
the file ""/var/lib/postgresql/8.3/main/backup_label"".",,,,,, 2009-02-27 \
01:26:52.868 EST,,,7419,,49a787ab.1cfb,1,,2009-02-27 01:26:51 \
EST,,0,LOG,00000,"startup process (PID 7422) was terminated by signal 6: \
Aborted",,,,,,,, 2009-02-27 01:26:52.868 EST,,,7419,,49a787ab.1cfb,2,,2009-02-27 \
01:26:51 EST,,0,LOG,00000,"aborting startup due to startup process failure",,,,,,,,
So, I tried copying the files in the /mnt/postgresql_archives over to pg_xlog. This \
seemed to work, and the updates were applied. Never at any point did I get a \
recovery.done file. Also, for whatever reason, I was never able to get any debug \
info from pg_standby in standby.log.
Anyhow, I've burned up a couple days trying to figure this out. Any help would be \
much appreciated.
Thanks,
Brad
--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic