'Re: Streaming replication: replicant server not starting (9.4.4, Win 2008)'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgresql-general
Subject:    Re: Streaming replication: replicant server not starting (9.4.4, Win 2008)
From:       Ibrahim Edib Kokdemir <kokdemir () gmail ! com>
Date:       2018-01-30 18:38:23
Message-ID: CABnTomJwzi4PhXgzeJXibFt6-QmXGXZZzd2=kDvVjgrSbOTh=A () mail ! gmail ! com
[Download RAW message or body]

Hi Tim,
You have to enable hot_standby=on parameter on the replica server. See the
below link.
https://wiki.postgresql.org/wiki/Hot_Standby

Regards,
Ibrahim.

2018-01-30 21:16 GMT+03:00 Tim Bowden <tim.bowden@mapforge.com.au>:

> I've inherited a PG 9.4.4 install on Win 2008 that I'm wanting to
> stream from (abt 80Gb on disk).  Everything seems to be working from
> the "master" side.
>
> I have an AMI of the master host (a vmware instance in our soon to
> close data centre) that has been spun up in AWS (replicant).  In it's
> "default" configuration (ie, exactly the same as master) it worked
> fine.
>
> I ran pg_basebackup on the master (from memory- details on work pc):
> pg_basebackup -D "e:\\mybasebackup\\" -F t -R -X f -z -c fast \
>  -h <ip_addr> -U <rep_user>
>
> Base backup is created fine.  I've copied it up to AWS and extracted it
> into the correct location (after ensuring it is empty) on the replicant
> windows host (tar -xzvf base.tgz from a linux box with the pg data
> drive mounted).  File metadata is changed during the extraction (ie,
> file timestamps- could this possibly be an issue?) but permissions on
> the win2008 replicant host have been set & double checked.
>
> When I try to start the pg server on replicant, the log streaming from
> master works (and keeps working till the server is rebooted), but the
> server can't start for some reason.
>
> Copy of pg log file on replicant:
>
> 2018-01-30 17:18:02 AWST LOG:  database system was shut down in
> recovery at 2018-01-30 17:16:59 AWST
> 2018-01-30 17:18:02 AWST LOG:  entering standby mode
> 2018-01-30 17:18:02 AWST LOG:  redo starts at C26/84018A30
> 2018-01-30 17:18:02 AWST LOG:  consistent recovery state reached at
> C26/8401C6D0
> 2018-01-30 17:18:02 AWST LOG:  invalid record length at C26/8401C6D0
> 2018-01-30 17:18:03 AWST LOG:  started streaming WAL from primary at
> C26/84000000 on timeline 1
> 2018-01-30 17:18:03 AWST FATAL:  the database system is starting up
> 2018-01-30 17:18:04 AWST FATAL:  the database system is starting up
> 2018-01-30 17:18:05 AWST FATAL:  the database system is starting up
> 2018-01-30 17:18:06 AWST FATAL:  the database system is starting up
> 2018-01-30 17:18:07 AWST FATAL:  the database system is starting up
> 2018-01-30 17:18:08 AWST FATAL:  the database system is starting up
>
> The last line is repeated for about a minute till it gives up.  It
> leaves behind a bunch of processes that keep streaming log files as
> master creates new wal records.  This keeps going till the replicant
> box is rebooted (easiest way to clean up the processes and free up port
> 5432).
>
> I can keep rebooting the replicant host and on retrying to start the pg
> server, it will apply any new wal files that have been streamed, then
> fail to start the server but keep streaming again.
>
> The Windows service start command for PG:
>
> "C:\Program Files (x86)\PostgreSQL\9.4.4\bin\pg_ctl.exe" runservice -N
> "postgresql-9.4" -D "E:\PostgreSQL\9.4.4\data" -w
>
> Config files:
>
> Master postgresql.conf:
>
> dynamic_shared_memory_type = windows
> wal_level = hot_standby # Was "archive", but we'd like to do
> hot...
> archive_mode = on
> archive_command = 'copy "%p"
> "e:\\PostgreSQL\\9.4.4\\wal_archive\\%f"'  # Windows
> max_wal_senders = 8
> wal_keep_segments = 900 # Sick of losing old segments while I fix this
> max_replication_slots = 8
> log_line_prefix = '%t '
> log_timezone = 'Australia/Perth'
> datestyle = 'iso, dmy'
> timezone = 'Australia/Perth'
> lc_messages = 'English_Australia.1252'
> lc_monetary = 'English_Australia.1252'
> lc_numeric = 'English_Australia.1252'
> lc_time = 'English_Australia.1252'
> default_text_search_config = 'pg_catalog.english'
> listen_addresses = '*'
> port = 5432
> max_connections = 300
> shared_buffers = 500MB
> work_mem = 32MB
> maintenance_work_mem = 128MB
> checkpoint_segments = 100
> random_page_cost = 2.0
> effective_cache_size = 1500MB
> log_destination = 'stderr'
> logging_collector = on
> log_min_duration_statement = 500
> log_line_prefix = '%t '
>
> Replicant postgresql.conf (essentially a copy from master with minimal
> changes):
>
> dynamic_shared_memory_type = windows
> wal_level = archive
> max_standby_streaming_delay = 30s
> wal_receiver_status_interval = 30s
> log_line_prefix = '%t '
> log_timezone = 'Australia/Perth'
> datestyle = 'iso, dmy'
> timezone = 'Australia/Perth'
> lc_messages = 'English_Australia.1252'
> lc_monetary = 'English_Australia.1252'
> lc_numeric = 'English_Australia.1252'
> lc_time = 'English_Australia.1252'
> default_text_search_config = 'pg_catalog.english'
> listen_addresses = '*'
> port = 5432
> max_connections = 300
> shared_buffers = 500MB
> work_mem = 32MB
> maintenance_work_mem = 128MB
> checkpoint_segments = 100
> random_page_cost = 2.0
> effective_cache_size = 1500MB
> log_destination = 'stderr'
> logging_collector = on
> log_min_duration_statement = 500
> log_line_prefix = '%t '
>
> replicant recovery.conf:
>
> standby_mode = 'on'
> primary_conninfo = 'host=<master_ip> user=<rep_user>
> password=<don't_tell> connect_timeout=10 application_name=replicant_3'
> primary_slot_name = 'replicant_3'
> recovery_min_apply_delay = 10
> #restore_command = 'copy
> "e:\\PostgreSQL\\9.4.4\\basebackup\\wal_archive\\%f" "%p" '
> trigger_file = 'recovery_done.txt'
>
> The restore_command was uncommented at first and worked fine. All new
> updates now are provided by streaming.
>
> Any ideas?  I'm tearing my hair out with this. The boxes are about as
> identical as you can get, apart from the fact that one is running on
> VMware and the other in AWS.
>
> Thanks,
> Tim Bowden
>
>

[Attachment #3 (text/html)]

<div dir="ltr"><div>Hi Tim,  </div><div>You have to enable hot_standby=on parameter \
on the replica server. See the below link.     </div><a \
href="https://wiki.postgresql.org/wiki/Hot_Standby">https://wiki.postgresql.org/wiki/H \
ot_Standby</a><br><div><br></div><div>Regards,</div><div>Ibrahim.</div></div><div \
class="gmail_extra"><br><div class="gmail_quote">2018-01-30 21:16 GMT+03:00 Tim \
Bowden <span dir="ltr">&lt;<a href="mailto:tim.bowden@mapforge.com.au" \
target="_blank">tim.bowden@mapforge.com.au</a>&gt;</span>:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">I&#39;ve inherited a PG 9.4.4 install on Win 2008 that \
I&#39;m wanting to<br> stream from (abt 80Gb on disk).   Everything seems to be \
working from<br> the &quot;master&quot; side.<br>
<br>
I have an AMI of the master host (a vmware instance in our soon to<br>
close data centre) that has been spun up in AWS (replicant).   In it&#39;s<br>
&quot;default&quot; configuration (ie, exactly the same as master) it worked<br>
fine.<br>
<br>
I ran pg_basebackup on the master (from memory- details on work pc):<br>
pg_basebackup -D &quot;e:\\mybasebackup\\&quot; -F t -R -X f -z -c fast \<br>
  -h &lt;ip_addr&gt; -U &lt;rep_user&gt;<br>
<br>
Base backup is created fine.   I&#39;ve copied it up to AWS and extracted it<br>
into the correct location (after ensuring it is empty) on the replicant<br>
windows host (tar -xzvf base.tgz from a linux box with the pg data<br>
drive mounted).   File metadata is changed during the extraction (ie,<br>
file timestamps- could this possibly be an issue?) but permissions on<br>
the win2008 replicant host have been set &amp; double checked.<br>
<br>
When I try to start the pg server on replicant, the log streaming from<br>
master works (and keeps working till the server is rebooted), but the<br>
server can&#39;t start for some reason.<br>
<br>
Copy of pg log file on replicant:<br>
<br>
2018-01-30 17:18:02 AWST LOG:    database system was shut down in<br>
recovery at 2018-01-30 17:16:59 AWST<br>
2018-01-30 17:18:02 AWST LOG:    entering standby mode<br>
2018-01-30 17:18:02 AWST LOG:    redo starts at C26/84018A30<br>
2018-01-30 17:18:02 AWST LOG:    consistent recovery state reached at<br>
C26/8401C6D0<br>
2018-01-30 17:18:02 AWST LOG:    invalid record length at C26/8401C6D0<br>
2018-01-30 17:18:03 AWST LOG:    started streaming WAL from primary at<br>
C26/84000000 on timeline 1<br>
2018-01-30 17:18:03 AWST FATAL:    the database system is starting up<br>
2018-01-30 17:18:04 AWST FATAL:    the database system is starting up<br>
2018-01-30 17:18:05 AWST FATAL:    the database system is starting up<br>
2018-01-30 17:18:06 AWST FATAL:    the database system is starting up<br>
2018-01-30 17:18:07 AWST FATAL:    the database system is starting up<br>
2018-01-30 17:18:08 AWST FATAL:    the database system is starting up<br>
<br>
The last line is repeated for about a minute till it gives up.   It<br>
leaves behind a bunch of processes that keep streaming log files as<br>
master creates new wal records.   This keeps going till the replicant<br>
box is rebooted (easiest way to clean up the processes and free up port<br>
5432).<br>
<br>
I can keep rebooting the replicant host and on retrying to start the pg<br>
server, it will apply any new wal files that have been streamed, then<br>
fail to start the server but keep streaming again.<br>
<br>
The Windows service start command for PG:  <br>
<br>
&quot;C:\Program Files (x86)\PostgreSQL\9.4.4\bin\pg_<wbr>ctl.exe&quot; runservice \
-N<br> &quot;postgresql-9.4&quot; -D &quot;E:\PostgreSQL\9.4.4\data&quot; -w<br>
<br>
Config files:<br>
<br>
Master postgresql.conf:<br>
<br>
dynamic_shared_memory_type = windows<br>
wal_level = hot_standby # Was &quot;archive&quot;, but we&#39;d like to do<br>
hot...<br>
archive_mode = on<br>
archive_command = &#39;copy &quot;%p&quot;<br>
&quot;e:\\PostgreSQL\\9.4.4\\wal_<wbr>archive\\%f&quot;&#39;    # Windows<br>
max_wal_senders = 8<br>
wal_keep_segments = 900 # Sick of losing old segments while I fix this<br>
max_replication_slots = 8<br>
log_line_prefix = &#39;%t &#39;<br>
log_timezone = &#39;Australia/Perth&#39;<br>
datestyle = &#39;iso, dmy&#39;<br>
timezone = &#39;Australia/Perth&#39;<br>
lc_messages = &#39;English_Australia.1252&#39;<br>
lc_monetary = &#39;English_Australia.1252&#39;<br>
lc_numeric = &#39;English_Australia.1252&#39;<br>
lc_time = &#39;English_Australia.1252&#39;<br>
default_text_search_config = &#39;pg_catalog.english&#39;<br>
listen_addresses = &#39;*&#39;<br>
port = 5432<br>
max_connections = 300<br>
shared_buffers = 500MB<br>
work_mem = 32MB<br>
maintenance_work_mem = 128MB<br>
checkpoint_segments = 100<br>
random_page_cost = 2.0<br>
effective_cache_size = 1500MB<br>
log_destination = &#39;stderr&#39;<br>
logging_collector = on<br>
log_min_duration_statement = 500<br>
log_line_prefix = &#39;%t &#39;<br>
<br>
Replicant postgresql.conf (essentially a copy from master with minimal<br>
changes):<br>
<br>
dynamic_shared_memory_type = windows<br>
wal_level = archive<br>
max_standby_streaming_delay = 30s<br>
wal_receiver_status_interval = 30s<br>
log_line_prefix = &#39;%t &#39;<br>
log_timezone = &#39;Australia/Perth&#39;<br>
datestyle = &#39;iso, dmy&#39;<br>
timezone = &#39;Australia/Perth&#39;<br>
lc_messages = &#39;English_Australia.1252&#39;<br>
lc_monetary = &#39;English_Australia.1252&#39;<br>
lc_numeric = &#39;English_Australia.1252&#39;<br>
lc_time = &#39;English_Australia.1252&#39;<br>
default_text_search_config = &#39;pg_catalog.english&#39;<br>
listen_addresses = &#39;*&#39;<br>
port = 5432<br>
max_connections = 300<br>
shared_buffers = 500MB<br>
work_mem = 32MB<br>
maintenance_work_mem = 128MB<br>
checkpoint_segments = 100<br>
random_page_cost = 2.0<br>
effective_cache_size = 1500MB<br>
log_destination = &#39;stderr&#39;<br>
logging_collector = on<br>
log_min_duration_statement = 500<br>
log_line_prefix = &#39;%t &#39;<br>
<br>
replicant recovery.conf:<br>
<br>
standby_mode = &#39;on&#39;<br>
primary_conninfo = &#39;host=&lt;master_ip&gt; user=&lt;rep_user&gt;<br>
password=&lt;don&#39;t_tell&gt; connect_timeout=10 \
application_name=replicant_3&#39;<br> primary_slot_name = &#39;replicant_3&#39;<br>
recovery_min_apply_delay = 10<br>
#restore_command = &#39;copy<br>
&quot;e:\\PostgreSQL\\9.4.4\\<wbr>basebackup\\wal_archive\\%f&quot; &quot;%p&quot; \
&#39;  <br> trigger_file = &#39;recovery_done.txt&#39;<br>
<br>
The restore_command was uncommented at first and worked fine. All new<br>
updates now are provided by streaming.<br>
<br>
Any ideas?   I&#39;m tearing my hair out with this. The boxes are about as<br>
identical as you can get, apart from the fact that one is running on<br>
VMware and the other in AWS.<br>
<br>
Thanks,<br>
Tim Bowden<br>
<br>
</blockquote></div><br></div>



[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic