[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgresql-admin
Subject:    Re: Streaming replication question
From:       Mark Steben <mark.steben () drivedominion ! com>
Date:       2019-08-04 11:06:32
Message-ID: CADyzmyyG6i9fBcBVPXKzRZLBp23Sb9OWGpYYdYQT1iPNU8dYxw () mail ! gmail ! com
[Download RAW message or body]

Thank your for your prompt response Keith.  You were correct.  The root of
the problem was an ssh host key change which caused all logshipping to
error, and therefore pg_xlog queued up.

On Sat, Aug 3, 2019 at 11:23 AM Keith <keith@keithf4.com> wrote:

>
>
> On Sat, Aug 3, 2019 at 9:36 AM Mark Steben <mark.steben@drivedominion.com>
> wrote:
>
>> Good morning,
>> We run postgres 9.4. Early Saturday morning we had a production postgres
>> outage because our pg_xlog directory ran out of space.  Tracing the cause
>> points to a scheduled reboot of the 3 database servers by our IT team to
>> install some monitoring software.  Because we did a simple stop/restart of
>> the database our replication slots did not get reset and pg_logs queued up
>> until the out-of-space condition
>>
>> Can someone please verify that the following WOULD HAVE BEEN the
>> appropriate action:
>> (Or offer corrections)
>>   1. Stop the postgres database (on both slaves)
>>   2. Run pg_drop_replication_slot (rep1, rep2 slotnames)  (on Master)
>>   3. Stop the postgres database (on master)
>>   4.  Reboot Linux Server to bring in monitoring software
>>   5.  Start the postgres database (on master)
>>   6.  Run pg_create_replication_slot (rep1, rep2 slotnames) (On Master)
>>   7. Start the postgres database (on both slaves)
>>
>> Thank you
>>
>> --
>> *Mark Steben*
>>  Database Administrator
>> @utoRevenue <http://www.autorevenue.com/> | Autobase
>> <http://www.autobase.net/>
>>   CRM division of Dominion Dealer Solutions
>> 95D Ashley Ave.
>> West Springfield, MA 01089
>> t: 413.327-3045
>> f: 413.383-9567
>>
>> www.fb.com/DominionDealerSolutions
>> www.twitter.com/DominionDealer
>>  www.drivedominion.com <http://www.autorevenue.com/>
>>
>> <http://autobasedigital.net/marketing/DD12_sig.jpg>
>>
>>
>>
> Restarting your databases should not have affected the replication slots
> like this. If you stopped the replicas first, then the primary would have
> just started keeping all the WAL files until the replicas came back,
> resuming from where they left off. If you stopped the primary first then
> the replicas would have just lost their connection until the primary came
> back and made its slots available again.
>
> The only time the slots don't stick around is if you do a failover from
> the primary to one of the replicas. In that case, yes, you do have to
> recreate the slots. When things were rebooted, was there any failovers
> kicked off?
>
> I'd check your postgres logs to see if there's any hint as to why things
> turned out the way they did. If all systems were rebooted at the exact same
> time, there may be some edge-case bug that's not being accounted for. But
> without a deeper dive into what happened, that would be hard to say and
> seems unlikely. I would say to make sure you are on the most recent version
> of 9.4 to ensure any known bug fixes are in place. Also highly recommend on
> planning to upgrade to a newer major version. 9.4 goes out of support this
> fall upon the release of version 12.
>
> Keith
>


-- 
*Mark Steben*
 Database Administrator
@utoRevenue <http://www.autorevenue.com/> | Autobase
<http://www.autobase.net/>
  CRM division of Dominion Dealer Solutions
95D Ashley Ave.
West Springfield, MA 01089
t: 413.327-3045
f: 413.383-9567

www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
 www.drivedominion.com <http://www.autorevenue.com/>

<http://autobasedigital.net/marketing/DD12_sig.jpg>

[Attachment #3 (text/html)]

<div dir="ltr">Thank your for your prompt response Keith.   You were correct.   The \
root of the problem was an ssh host key change which caused all logshipping to error, \
and therefore pg_xlog queued up.  </div><br><div class="gmail_quote"><div dir="ltr" \
class="gmail_attr">On Sat, Aug 3, 2019 at 11:23 AM Keith &lt;<a \
href="mailto:keith@keithf4.com">keith@keithf4.com</a>&gt; wrote:<br></div><blockquote \
class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><br></div><br><div \
class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Aug 3, 2019 at 9:36 AM \
Mark Steben &lt;<a href="mailto:mark.steben@drivedominion.com" \
target="_blank">mark.steben@drivedominion.com</a>&gt; wrote:<br></div><blockquote \
class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"><div dir="ltr">Good morning,<div>We run postgres \
9.4. Early Saturday morning we had a production postgres outage because our pg_xlog \
directory ran out of space.   Tracing the cause points to a scheduled reboot of the 3 \
database servers by our IT team to install some monitoring software.   Because we did \
a simple stop/restart of the database our replication slots did not get reset and \
pg_logs queued up until the out-of-space condition</div><div><br></div><div>Can \
someone please verify that the following WOULD HAVE BEEN the appropriate \
action:</div><div>(Or offer corrections)</div><div>   1. Stop the postgres database \
(on both slaves)</div><div>   2. Run pg_drop_replication_slot (rep1, rep2 slotnames)  \
(on Master)</div><div>   3. Stop the postgres database (on master)</div><div>   4.   \
Reboot Linux Server to bring in monitoring software</div><div>   5.   Start the \
postgres database (on master)</div><div>   6.   Run pg_create_replication_slot (rep1, \
rep2 slotnames) (On Master)</div><div>   7. Start the postgres database (on both \
slaves)</div><div><br></div><div>Thank you  <br clear="all"><div><br></div>-- \
<br><div dir="ltr" class="gmail-m_290382367811697379gmail-m_5277969213302503186gmail_signature"><div><span><font \
color="#888888"><span style="font-family:Calibri;font-size:11pt"><span \
style="color:rgb(51,51,51)"><b>Mark Steben</b></span> </span><br>  Database \
Administrator<span style="font-family:&quot;Century Gothic&quot;;font-size:8pt"><span \
style="color:rgb(51,51,51)"></span><br><a \
style="color:rgb(79,129,189);font-size:8pt;font-weight:bold;text-decoration:none" \
href="http://www.autorevenue.com/" target="_blank">@utoRevenue</a>  <span \
style="color:rgb(79,129,189);font-size:10pt">|</span>  <a \
style="color:rgb(79,129,189);font-size:8pt;font-weight:bold;text-decoration:none" \
href="http://www.autobase.net/" target="_blank">Autobase</a>  <span \
style="color:rgb(79,129,189);font-size:10pt"></span> <br>   <span \
style="color:rgb(79,129,189);font-size:8pt;font-weight:bold;text-decoration:none">CRM \
division of Dominion Dealer Solutions</span>  <br><span \
style="color:gray;font-size:8pt">95D Ashley Ave.<br>West Springfield, MA 01089</span> \
<br> <span style="color:gray;font-size:8pt">t: <a \
value="+14132434800">413.327-3045</a></span> <br><span \
style="color:gray;font-size:8pt">f: <a value="+14132434800">413.383-9567</a></span> \
</span></font></span></div><p><span><font color="#888888"><span \
style="font-family:&quot;Century Gothic&quot;;font-size:8pt"><a \
href="http://www.fb.com/DominionDealerSolutions" \
target="_blank">www.fb.com/DominionDealerSolutions</a><br><a \
href="http://www.twitter.com/DominionDealer" \
target="_blank">www.twitter.com/DominionDealer</a><br><span \
style="color:gray;font-size:8pt">  <a href="http://www.autorevenue.com/" \
target="_blank">www.drivedominion.com</a><br><br><a \
href="http://autobasedigital.net/marketing/DD12_sig.jpg" \
target="_blank"></a><br><br></span></span></font></span><br></p></div></div></div></blockquote><div><br></div><div>Restarting \
your databases should not have affected the replication slots like this. If you \
stopped the replicas first, then the primary would have just started keeping all the \
WAL files until the replicas came back, resuming from where they left off. If you \
stopped the primary first then the replicas would have just lost their connection \
until the primary came back and made its slots available \
again.</div><div><br></div><div>The only time the slots don&#39;t stick around is if \
you do a failover from the primary to one of the replicas. In that case, yes, you do \
have to recreate the slots. When things were rebooted, was there any failovers kicked \
off?<br></div><div><br></div><div>I&#39;d check your postgres logs to see if \
there&#39;s any hint as to why things turned out the way they did. If all systems \
were rebooted at the exact same time, there may be some edge-case bug that&#39;s not \
being accounted for. But without a deeper dive into what happened, that would be hard \
to say and seems unlikely. I would say to make sure you are on the most recent \
version of 9.4 to ensure any known bug fixes are in place. Also highly recommend on \
planning to upgrade to a newer major version. 9.4 goes out of support this fall upon \
the release of version 12.</div><div><br></div><div>Keith<br></div></div></div> \
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" \
class="gmail_signature"><div><span><font color="#888888"><span \
style="font-family:Calibri;font-size:11pt"><span style="color:rgb(51,51,51)"><b>Mark \
Steben</b></span> </span><br>  Database Administrator<span \
style="font-family:&quot;Century Gothic&quot;;font-size:8pt"><span \
style="color:rgb(51,51,51)"></span><br><a \
style="color:rgb(79,129,189);font-size:8pt;font-weight:bold;text-decoration:none" \
href="http://www.autorevenue.com/" target="_blank">@utoRevenue</a>  <span \
style="color:rgb(79,129,189);font-size:10pt">|</span>  <a \
style="color:rgb(79,129,189);font-size:8pt;font-weight:bold;text-decoration:none" \
href="http://www.autobase.net/" target="_blank">Autobase</a>  <span \
style="color:rgb(79,129,189);font-size:10pt"></span> <br>   <span \
style="color:rgb(79,129,189);font-size:8pt;font-weight:bold;text-decoration:none">CRM \
division of Dominion Dealer Solutions</span>  <br><span \
style="color:gray;font-size:8pt">95D Ashley Ave.<br>West Springfield, MA 01089</span> \
<br> <span style="color:gray;font-size:8pt">t: <a \
value="+14132434800">413.327-3045</a></span> <br><span \
style="color:gray;font-size:8pt">f: <a value="+14132434800">413.383-9567</a></span> \
</span></font></span></div><p><span><font color="#888888"><span \
style="font-family:&quot;Century Gothic&quot;;font-size:8pt"><a \
href="http://www.fb.com/DominionDealerSolutions" \
target="_blank">www.fb.com/DominionDealerSolutions</a><br><a \
href="http://www.twitter.com/DominionDealer" \
target="_blank">www.twitter.com/DominionDealer</a><br><span \
style="color:gray;font-size:8pt">  <a href="http://www.autorevenue.com/" \
target="_blank">www.drivedominion.com</a><br><br><a \
href="http://autobasedigital.net/marketing/DD12_sig.jpg" \
target="_blank"></a><br><br></span></span></font></span><br><span><font \
color="#888888"><span style="font-family:&quot;Century \
Gothic&quot;;font-size:8pt"><span \
style="color:gray;font-size:8pt"><br></span></span></font></span></p><font \
color="#888888"></font></div>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic