[prev in list] [next in list] [prev in thread] [next in thread]
List: postgresql-general
Subject: Re: Streaming replication: replicant server not starting (9.4.4, Win 2008)
From: Ibrahim Edib Kokdemir <kokdemir () gmail ! com>
Date: 2018-01-30 18:38:23
Message-ID: CABnTomJwzi4PhXgzeJXibFt6-QmXGXZZzd2=kDvVjgrSbOTh=A () mail ! gmail ! com
[Download RAW message or body]
Hi Tim,
You have to enable hot_standby=on parameter on the replica server. See the
below link.
https://wiki.postgresql.org/wiki/Hot_Standby
Regards,
Ibrahim.
2018-01-30 21:16 GMT+03:00 Tim Bowden <tim.bowden@mapforge.com.au>:
> I've inherited a PG 9.4.4 install on Win 2008 that I'm wanting to
> stream from (abt 80Gb on disk). Everything seems to be working from
> the "master" side.
>
> I have an AMI of the master host (a vmware instance in our soon to
> close data centre) that has been spun up in AWS (replicant). In it's
> "default" configuration (ie, exactly the same as master) it worked
> fine.
>
> I ran pg_basebackup on the master (from memory- details on work pc):
> pg_basebackup -D "e:\\mybasebackup\\" -F t -R -X f -z -c fast \
> -h <ip_addr> -U <rep_user>
>
> Base backup is created fine. I've copied it up to AWS and extracted it
> into the correct location (after ensuring it is empty) on the replicant
> windows host (tar -xzvf base.tgz from a linux box with the pg data
> drive mounted). File metadata is changed during the extraction (ie,
> file timestamps- could this possibly be an issue?) but permissions on
> the win2008 replicant host have been set & double checked.
>
> When I try to start the pg server on replicant, the log streaming from
> master works (and keeps working till the server is rebooted), but the
> server can't start for some reason.
>
> Copy of pg log file on replicant:
>
> 2018-01-30 17:18:02 AWST LOG: database system was shut down in
> recovery at 2018-01-30 17:16:59 AWST
> 2018-01-30 17:18:02 AWST LOG: entering standby mode
> 2018-01-30 17:18:02 AWST LOG: redo starts at C26/84018A30
> 2018-01-30 17:18:02 AWST LOG: consistent recovery state reached at
> C26/8401C6D0
> 2018-01-30 17:18:02 AWST LOG: invalid record length at C26/8401C6D0
> 2018-01-30 17:18:03 AWST LOG: started streaming WAL from primary at
> C26/84000000 on timeline 1
> 2018-01-30 17:18:03 AWST FATAL: the database system is starting up
> 2018-01-30 17:18:04 AWST FATAL: the database system is starting up
> 2018-01-30 17:18:05 AWST FATAL: the database system is starting up
> 2018-01-30 17:18:06 AWST FATAL: the database system is starting up
> 2018-01-30 17:18:07 AWST FATAL: the database system is starting up
> 2018-01-30 17:18:08 AWST FATAL: the database system is starting up
>
> The last line is repeated for about a minute till it gives up. It
> leaves behind a bunch of processes that keep streaming log files as
> master creates new wal records. This keeps going till the replicant
> box is rebooted (easiest way to clean up the processes and free up port
> 5432).
>
> I can keep rebooting the replicant host and on retrying to start the pg
> server, it will apply any new wal files that have been streamed, then
> fail to start the server but keep streaming again.
>
> The Windows service start command for PG:
>
> "C:\Program Files (x86)\PostgreSQL\9.4.4\bin\pg_ctl.exe" runservice -N
> "postgresql-9.4" -D "E:\PostgreSQL\9.4.4\data" -w
>
> Config files:
>
> Master postgresql.conf:
>
> dynamic_shared_memory_type = windows
> wal_level = hot_standby # Was "archive", but we'd like to do
> hot...
> archive_mode = on
> archive_command = 'copy "%p"
> "e:\\PostgreSQL\\9.4.4\\wal_archive\\%f"' # Windows
> max_wal_senders = 8
> wal_keep_segments = 900 # Sick of losing old segments while I fix this
> max_replication_slots = 8
> log_line_prefix = '%t '
> log_timezone = 'Australia/Perth'
> datestyle = 'iso, dmy'
> timezone = 'Australia/Perth'
> lc_messages = 'English_Australia.1252'
> lc_monetary = 'English_Australia.1252'
> lc_numeric = 'English_Australia.1252'
> lc_time = 'English_Australia.1252'
> default_text_search_config = 'pg_catalog.english'
> listen_addresses = '*'
> port = 5432
> max_connections = 300
> shared_buffers = 500MB
> work_mem = 32MB
> maintenance_work_mem = 128MB
> checkpoint_segments = 100
> random_page_cost = 2.0
> effective_cache_size = 1500MB
> log_destination = 'stderr'
> logging_collector = on
> log_min_duration_statement = 500
> log_line_prefix = '%t '
>
> Replicant postgresql.conf (essentially a copy from master with minimal
> changes):
>
> dynamic_shared_memory_type = windows
> wal_level = archive
> max_standby_streaming_delay = 30s
> wal_receiver_status_interval = 30s
> log_line_prefix = '%t '
> log_timezone = 'Australia/Perth'
> datestyle = 'iso, dmy'
> timezone = 'Australia/Perth'
> lc_messages = 'English_Australia.1252'
> lc_monetary = 'English_Australia.1252'
> lc_numeric = 'English_Australia.1252'
> lc_time = 'English_Australia.1252'
> default_text_search_config = 'pg_catalog.english'
> listen_addresses = '*'
> port = 5432
> max_connections = 300
> shared_buffers = 500MB
> work_mem = 32MB
> maintenance_work_mem = 128MB
> checkpoint_segments = 100
> random_page_cost = 2.0
> effective_cache_size = 1500MB
> log_destination = 'stderr'
> logging_collector = on
> log_min_duration_statement = 500
> log_line_prefix = '%t '
>
> replicant recovery.conf:
>
> standby_mode = 'on'
> primary_conninfo = 'host=<master_ip> user=<rep_user>
> password=<don't_tell> connect_timeout=10 application_name=replicant_3'
> primary_slot_name = 'replicant_3'
> recovery_min_apply_delay = 10
> #restore_command = 'copy
> "e:\\PostgreSQL\\9.4.4\\basebackup\\wal_archive\\%f" "%p" '
> trigger_file = 'recovery_done.txt'
>
> The restore_command was uncommented at first and worked fine. All new
> updates now are provided by streaming.
>
> Any ideas? I'm tearing my hair out with this. The boxes are about as
> identical as you can get, apart from the fact that one is running on
> VMware and the other in AWS.
>
> Thanks,
> Tim Bowden
>
>
[Attachment #3 (text/html)]
<div dir="ltr"><div>Hi Tim, </div><div>You have to enable hot_standby=on parameter \
on the replica server. See the below link. </div><a \
href="https://wiki.postgresql.org/wiki/Hot_Standby">https://wiki.postgresql.org/wiki/H \
ot_Standby</a><br><div><br></div><div>Regards,</div><div>Ibrahim.</div></div><div \
class="gmail_extra"><br><div class="gmail_quote">2018-01-30 21:16 GMT+03:00 Tim \
Bowden <span dir="ltr"><<a href="mailto:tim.bowden@mapforge.com.au" \
target="_blank">tim.bowden@mapforge.com.au</a>></span>:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">I've inherited a PG 9.4.4 install on Win 2008 that \
I'm wanting to<br> stream from (abt 80Gb on disk). Everything seems to be \
working from<br> the "master" side.<br>
<br>
I have an AMI of the master host (a vmware instance in our soon to<br>
close data centre) that has been spun up in AWS (replicant). In it's<br>
"default" configuration (ie, exactly the same as master) it worked<br>
fine.<br>
<br>
I ran pg_basebackup on the master (from memory- details on work pc):<br>
pg_basebackup -D "e:\\mybasebackup\\" -F t -R -X f -z -c fast \<br>
-h <ip_addr> -U <rep_user><br>
<br>
Base backup is created fine. I've copied it up to AWS and extracted it<br>
into the correct location (after ensuring it is empty) on the replicant<br>
windows host (tar -xzvf base.tgz from a linux box with the pg data<br>
drive mounted). File metadata is changed during the extraction (ie,<br>
file timestamps- could this possibly be an issue?) but permissions on<br>
the win2008 replicant host have been set & double checked.<br>
<br>
When I try to start the pg server on replicant, the log streaming from<br>
master works (and keeps working till the server is rebooted), but the<br>
server can't start for some reason.<br>
<br>
Copy of pg log file on replicant:<br>
<br>
2018-01-30 17:18:02 AWST LOG: database system was shut down in<br>
recovery at 2018-01-30 17:16:59 AWST<br>
2018-01-30 17:18:02 AWST LOG: entering standby mode<br>
2018-01-30 17:18:02 AWST LOG: redo starts at C26/84018A30<br>
2018-01-30 17:18:02 AWST LOG: consistent recovery state reached at<br>
C26/8401C6D0<br>
2018-01-30 17:18:02 AWST LOG: invalid record length at C26/8401C6D0<br>
2018-01-30 17:18:03 AWST LOG: started streaming WAL from primary at<br>
C26/84000000 on timeline 1<br>
2018-01-30 17:18:03 AWST FATAL: the database system is starting up<br>
2018-01-30 17:18:04 AWST FATAL: the database system is starting up<br>
2018-01-30 17:18:05 AWST FATAL: the database system is starting up<br>
2018-01-30 17:18:06 AWST FATAL: the database system is starting up<br>
2018-01-30 17:18:07 AWST FATAL: the database system is starting up<br>
2018-01-30 17:18:08 AWST FATAL: the database system is starting up<br>
<br>
The last line is repeated for about a minute till it gives up. It<br>
leaves behind a bunch of processes that keep streaming log files as<br>
master creates new wal records. This keeps going till the replicant<br>
box is rebooted (easiest way to clean up the processes and free up port<br>
5432).<br>
<br>
I can keep rebooting the replicant host and on retrying to start the pg<br>
server, it will apply any new wal files that have been streamed, then<br>
fail to start the server but keep streaming again.<br>
<br>
The Windows service start command for PG: <br>
<br>
"C:\Program Files (x86)\PostgreSQL\9.4.4\bin\pg_<wbr>ctl.exe" runservice \
-N<br> "postgresql-9.4" -D "E:\PostgreSQL\9.4.4\data" -w<br>
<br>
Config files:<br>
<br>
Master postgresql.conf:<br>
<br>
dynamic_shared_memory_type = windows<br>
wal_level = hot_standby # Was "archive", but we'd like to do<br>
hot...<br>
archive_mode = on<br>
archive_command = 'copy "%p"<br>
"e:\\PostgreSQL\\9.4.4\\wal_<wbr>archive\\%f"' # Windows<br>
max_wal_senders = 8<br>
wal_keep_segments = 900 # Sick of losing old segments while I fix this<br>
max_replication_slots = 8<br>
log_line_prefix = '%t '<br>
log_timezone = 'Australia/Perth'<br>
datestyle = 'iso, dmy'<br>
timezone = 'Australia/Perth'<br>
lc_messages = 'English_Australia.1252'<br>
lc_monetary = 'English_Australia.1252'<br>
lc_numeric = 'English_Australia.1252'<br>
lc_time = 'English_Australia.1252'<br>
default_text_search_config = 'pg_catalog.english'<br>
listen_addresses = '*'<br>
port = 5432<br>
max_connections = 300<br>
shared_buffers = 500MB<br>
work_mem = 32MB<br>
maintenance_work_mem = 128MB<br>
checkpoint_segments = 100<br>
random_page_cost = 2.0<br>
effective_cache_size = 1500MB<br>
log_destination = 'stderr'<br>
logging_collector = on<br>
log_min_duration_statement = 500<br>
log_line_prefix = '%t '<br>
<br>
Replicant postgresql.conf (essentially a copy from master with minimal<br>
changes):<br>
<br>
dynamic_shared_memory_type = windows<br>
wal_level = archive<br>
max_standby_streaming_delay = 30s<br>
wal_receiver_status_interval = 30s<br>
log_line_prefix = '%t '<br>
log_timezone = 'Australia/Perth'<br>
datestyle = 'iso, dmy'<br>
timezone = 'Australia/Perth'<br>
lc_messages = 'English_Australia.1252'<br>
lc_monetary = 'English_Australia.1252'<br>
lc_numeric = 'English_Australia.1252'<br>
lc_time = 'English_Australia.1252'<br>
default_text_search_config = 'pg_catalog.english'<br>
listen_addresses = '*'<br>
port = 5432<br>
max_connections = 300<br>
shared_buffers = 500MB<br>
work_mem = 32MB<br>
maintenance_work_mem = 128MB<br>
checkpoint_segments = 100<br>
random_page_cost = 2.0<br>
effective_cache_size = 1500MB<br>
log_destination = 'stderr'<br>
logging_collector = on<br>
log_min_duration_statement = 500<br>
log_line_prefix = '%t '<br>
<br>
replicant recovery.conf:<br>
<br>
standby_mode = 'on'<br>
primary_conninfo = 'host=<master_ip> user=<rep_user><br>
password=<don't_tell> connect_timeout=10 \
application_name=replicant_3'<br> primary_slot_name = 'replicant_3'<br>
recovery_min_apply_delay = 10<br>
#restore_command = 'copy<br>
"e:\\PostgreSQL\\9.4.4\\<wbr>basebackup\\wal_archive\\%f" "%p" \
' <br> trigger_file = 'recovery_done.txt'<br>
<br>
The restore_command was uncommented at first and worked fine. All new<br>
updates now are provided by streaming.<br>
<br>
Any ideas? I'm tearing my hair out with this. The boxes are about as<br>
identical as you can get, apart from the fact that one is running on<br>
VMware and the other in AWS.<br>
<br>
Thanks,<br>
Tim Bowden<br>
<br>
</blockquote></div><br></div>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic