[prev in list] [next in list] [prev in thread] [next in thread]
List: postgresql-admin
Subject: Re: Streaming replication question
From: Mark Steben <mark.steben () drivedominion ! com>
Date: 2019-08-04 11:06:32
Message-ID: CADyzmyyG6i9fBcBVPXKzRZLBp23Sb9OWGpYYdYQT1iPNU8dYxw () mail ! gmail ! com
[Download RAW message or body]
Thank your for your prompt response Keith. You were correct. The root of
the problem was an ssh host key change which caused all logshipping to
error, and therefore pg_xlog queued up.
On Sat, Aug 3, 2019 at 11:23 AM Keith <keith@keithf4.com> wrote:
>
>
> On Sat, Aug 3, 2019 at 9:36 AM Mark Steben <mark.steben@drivedominion.com>
> wrote:
>
>> Good morning,
>> We run postgres 9.4. Early Saturday morning we had a production postgres
>> outage because our pg_xlog directory ran out of space. Tracing the cause
>> points to a scheduled reboot of the 3 database servers by our IT team to
>> install some monitoring software. Because we did a simple stop/restart of
>> the database our replication slots did not get reset and pg_logs queued up
>> until the out-of-space condition
>>
>> Can someone please verify that the following WOULD HAVE BEEN the
>> appropriate action:
>> (Or offer corrections)
>> 1. Stop the postgres database (on both slaves)
>> 2. Run pg_drop_replication_slot (rep1, rep2 slotnames) (on Master)
>> 3. Stop the postgres database (on master)
>> 4. Reboot Linux Server to bring in monitoring software
>> 5. Start the postgres database (on master)
>> 6. Run pg_create_replication_slot (rep1, rep2 slotnames) (On Master)
>> 7. Start the postgres database (on both slaves)
>>
>> Thank you
>>
>> --
>> *Mark Steben*
>> Database Administrator
>> @utoRevenue <http://www.autorevenue.com/> | Autobase
>> <http://www.autobase.net/>
>> CRM division of Dominion Dealer Solutions
>> 95D Ashley Ave.
>> West Springfield, MA 01089
>> t: 413.327-3045
>> f: 413.383-9567
>>
>> www.fb.com/DominionDealerSolutions
>> www.twitter.com/DominionDealer
>> www.drivedominion.com <http://www.autorevenue.com/>
>>
>> <http://autobasedigital.net/marketing/DD12_sig.jpg>
>>
>>
>>
> Restarting your databases should not have affected the replication slots
> like this. If you stopped the replicas first, then the primary would have
> just started keeping all the WAL files until the replicas came back,
> resuming from where they left off. If you stopped the primary first then
> the replicas would have just lost their connection until the primary came
> back and made its slots available again.
>
> The only time the slots don't stick around is if you do a failover from
> the primary to one of the replicas. In that case, yes, you do have to
> recreate the slots. When things were rebooted, was there any failovers
> kicked off?
>
> I'd check your postgres logs to see if there's any hint as to why things
> turned out the way they did. If all systems were rebooted at the exact same
> time, there may be some edge-case bug that's not being accounted for. But
> without a deeper dive into what happened, that would be hard to say and
> seems unlikely. I would say to make sure you are on the most recent version
> of 9.4 to ensure any known bug fixes are in place. Also highly recommend on
> planning to upgrade to a newer major version. 9.4 goes out of support this
> fall upon the release of version 12.
>
> Keith
>
--
*Mark Steben*
Database Administrator
@utoRevenue <http://www.autorevenue.com/> | Autobase
<http://www.autobase.net/>
CRM division of Dominion Dealer Solutions
95D Ashley Ave.
West Springfield, MA 01089
t: 413.327-3045
f: 413.383-9567
www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
www.drivedominion.com <http://www.autorevenue.com/>
<http://autobasedigital.net/marketing/DD12_sig.jpg>
[Attachment #3 (text/html)]
<div dir="ltr">Thank your for your prompt response Keith. You were correct. The \
root of the problem was an ssh host key change which caused all logshipping to error, \
and therefore pg_xlog queued up. </div><br><div class="gmail_quote"><div dir="ltr" \
class="gmail_attr">On Sat, Aug 3, 2019 at 11:23 AM Keith <<a \
href="mailto:keith@keithf4.com">keith@keithf4.com</a>> wrote:<br></div><blockquote \
class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><br></div><br><div \
class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Aug 3, 2019 at 9:36 AM \
Mark Steben <<a href="mailto:mark.steben@drivedominion.com" \
target="_blank">mark.steben@drivedominion.com</a>> wrote:<br></div><blockquote \
class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"><div dir="ltr">Good morning,<div>We run postgres \
9.4. Early Saturday morning we had a production postgres outage because our pg_xlog \
directory ran out of space. Tracing the cause points to a scheduled reboot of the 3 \
database servers by our IT team to install some monitoring software. Because we did \
a simple stop/restart of the database our replication slots did not get reset and \
pg_logs queued up until the out-of-space condition</div><div><br></div><div>Can \
someone please verify that the following WOULD HAVE BEEN the appropriate \
action:</div><div>(Or offer corrections)</div><div> 1. Stop the postgres database \
(on both slaves)</div><div> 2. Run pg_drop_replication_slot (rep1, rep2 slotnames) \
(on Master)</div><div> 3. Stop the postgres database (on master)</div><div> 4. \
Reboot Linux Server to bring in monitoring software</div><div> 5. Start the \
postgres database (on master)</div><div> 6. Run pg_create_replication_slot (rep1, \
rep2 slotnames) (On Master)</div><div> 7. Start the postgres database (on both \
slaves)</div><div><br></div><div>Thank you <br clear="all"><div><br></div>-- \
<br><div dir="ltr" class="gmail-m_290382367811697379gmail-m_5277969213302503186gmail_signature"><div><span><font \
color="#888888"><span style="font-family:Calibri;font-size:11pt"><span \
style="color:rgb(51,51,51)"><b>Mark Steben</b></span> </span><br> Database \
Administrator<span style="font-family:"Century Gothic";font-size:8pt"><span \
style="color:rgb(51,51,51)"></span><br><a \
style="color:rgb(79,129,189);font-size:8pt;font-weight:bold;text-decoration:none" \
href="http://www.autorevenue.com/" target="_blank">@utoRevenue</a> <span \
style="color:rgb(79,129,189);font-size:10pt">|</span> <a \
style="color:rgb(79,129,189);font-size:8pt;font-weight:bold;text-decoration:none" \
href="http://www.autobase.net/" target="_blank">Autobase</a> <span \
style="color:rgb(79,129,189);font-size:10pt"></span> <br> <span \
style="color:rgb(79,129,189);font-size:8pt;font-weight:bold;text-decoration:none">CRM \
division of Dominion Dealer Solutions</span> <br><span \
style="color:gray;font-size:8pt">95D Ashley Ave.<br>West Springfield, MA 01089</span> \
<br> <span style="color:gray;font-size:8pt">t: <a \
value="+14132434800">413.327-3045</a></span> <br><span \
style="color:gray;font-size:8pt">f: <a value="+14132434800">413.383-9567</a></span> \
</span></font></span></div><p><span><font color="#888888"><span \
style="font-family:"Century Gothic";font-size:8pt"><a \
href="http://www.fb.com/DominionDealerSolutions" \
target="_blank">www.fb.com/DominionDealerSolutions</a><br><a \
href="http://www.twitter.com/DominionDealer" \
target="_blank">www.twitter.com/DominionDealer</a><br><span \
style="color:gray;font-size:8pt"> <a href="http://www.autorevenue.com/" \
target="_blank">www.drivedominion.com</a><br><br><a \
href="http://autobasedigital.net/marketing/DD12_sig.jpg" \
target="_blank"></a><br><br></span></span></font></span><br></p></div></div></div></blockquote><div><br></div><div>Restarting \
your databases should not have affected the replication slots like this. If you \
stopped the replicas first, then the primary would have just started keeping all the \
WAL files until the replicas came back, resuming from where they left off. If you \
stopped the primary first then the replicas would have just lost their connection \
until the primary came back and made its slots available \
again.</div><div><br></div><div>The only time the slots don't stick around is if \
you do a failover from the primary to one of the replicas. In that case, yes, you do \
have to recreate the slots. When things were rebooted, was there any failovers kicked \
off?<br></div><div><br></div><div>I'd check your postgres logs to see if \
there's any hint as to why things turned out the way they did. If all systems \
were rebooted at the exact same time, there may be some edge-case bug that's not \
being accounted for. But without a deeper dive into what happened, that would be hard \
to say and seems unlikely. I would say to make sure you are on the most recent \
version of 9.4 to ensure any known bug fixes are in place. Also highly recommend on \
planning to upgrade to a newer major version. 9.4 goes out of support this fall upon \
the release of version 12.</div><div><br></div><div>Keith<br></div></div></div> \
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" \
class="gmail_signature"><div><span><font color="#888888"><span \
style="font-family:Calibri;font-size:11pt"><span style="color:rgb(51,51,51)"><b>Mark \
Steben</b></span> </span><br> Database Administrator<span \
style="font-family:"Century Gothic";font-size:8pt"><span \
style="color:rgb(51,51,51)"></span><br><a \
style="color:rgb(79,129,189);font-size:8pt;font-weight:bold;text-decoration:none" \
href="http://www.autorevenue.com/" target="_blank">@utoRevenue</a> <span \
style="color:rgb(79,129,189);font-size:10pt">|</span> <a \
style="color:rgb(79,129,189);font-size:8pt;font-weight:bold;text-decoration:none" \
href="http://www.autobase.net/" target="_blank">Autobase</a> <span \
style="color:rgb(79,129,189);font-size:10pt"></span> <br> <span \
style="color:rgb(79,129,189);font-size:8pt;font-weight:bold;text-decoration:none">CRM \
division of Dominion Dealer Solutions</span> <br><span \
style="color:gray;font-size:8pt">95D Ashley Ave.<br>West Springfield, MA 01089</span> \
<br> <span style="color:gray;font-size:8pt">t: <a \
value="+14132434800">413.327-3045</a></span> <br><span \
style="color:gray;font-size:8pt">f: <a value="+14132434800">413.383-9567</a></span> \
</span></font></span></div><p><span><font color="#888888"><span \
style="font-family:"Century Gothic";font-size:8pt"><a \
href="http://www.fb.com/DominionDealerSolutions" \
target="_blank">www.fb.com/DominionDealerSolutions</a><br><a \
href="http://www.twitter.com/DominionDealer" \
target="_blank">www.twitter.com/DominionDealer</a><br><span \
style="color:gray;font-size:8pt"> <a href="http://www.autorevenue.com/" \
target="_blank">www.drivedominion.com</a><br><br><a \
href="http://autobasedigital.net/marketing/DD12_sig.jpg" \
target="_blank"></a><br><br></span></span></font></span><br><span><font \
color="#888888"><span style="font-family:"Century \
Gothic";font-size:8pt"><span \
style="color:gray;font-size:8pt"><br></span></span></font></span></p><font \
color="#888888"></font></div>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic