[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgresql-admin
Subject:    Re: [ADMIN] problems using pg_start_backup/pg_stop_backup and pg_basebackup at same time
From:       Magnus Hagander <magnus () hagander ! net>
Date:       2016-07-01 11:30:45
Message-ID: CABUevEy4gRxCa8jOSWK7miS02ogE37kh49yjo9nwTdb3=cT0OQ () mail ! gmail ! com
[Download RAW message or body]

On Mon, Jun 27, 2016 at 8:27 PM, Alex Malek <magicagent@gmail.com> wrote:

> Should I file a bug?
>

Oh, sorry. I did not spot that this was posted to pgsql-admin, which has a
lot fewer readers. I've moved it over to -hackers where more people will
see it.

(And also, please don't top-post on these lists, as it makes the discussion
much harder to follow)



>
> Does anyone know of a workaround?
>
> I could try running pg_basebackup w/ --xlog-method=fetch and see if that
> helps but I do like knowing that the base backup will have all needed WAL
> files regardless of WAL retention policy on master.
>
> Thanks.
> Alex
>
> Recap of what I *think* is going on:
>
> 1) pg_basebackup w/ --xlog-method=stream is started
> 2) archive_command is not executed while pg_basebackup() is running
>

There is no reason why archive_command should not run during a
pg_basebackup run. Do you get anything at all in the logfile indicating
that this would be the case? If you look at the ps output, does the
archiver process indicate what it's doing, and does it have a sub-process
for your rsync command?

Is there any chance that for example the network or other part of the
system gets saturated and the archive_command is simply runniung too slow
to keep up?


3) while pg_basebackup() is running pg_start_backup() is executed
> 4) Later pg_stop_backup() is executed.
> 5) pg_stop_backup() hangs related to the archive_command not running
> 6) hours later when pg_basebackup finishes, back logged archive commands
> get executed then pg_stop_backup() finishes
>

This would be consistent with (2), but there is no reason why (2) would
happen because of pg_basebackup unless there's an actual bug there.

//Magnus



> On Mon, Jun 20, 2016 at 10:15 AM, Alex Malek <magicagent@gmail.com> wrote:
>
>>
>> Thanks for the response Magnus.
>> I've replied inline below.
>>
>> On Mon, Jun 20, 2016 at 4:17 AM, Magnus Hagander <magnus@hagander.net>
>> wrote:
>>
>>> On Mon, Jun 13, 2016 at 6:59 PM, Alex Malek <magicagent@gmail.com>
>>> wrote:
>>>
>>>>
>>>> I am experiencing two problems
>>>> 1) pg_stop_backup hangs until pg_basebackup finishes
>>>> 2) WAL contains references to invalid pages
>>>>
>>>> (I suspect when 1. is fixed I won't experience 2. )
>>>>
>>>> Full description below:
>>>>
>>>>
>>>> postgresql: 9.3.13
>>>>
>>>> My situation is I create a writable report database every day using
>>>> pg_start_backup() / rsync / pg_stop_backup()
>>>> (whole process takes 1-2 hours, the rsync actually copies the data dir
>>>> of a  slave/warm spare DB)
>>>>
>>>
>>> Taking what is called an exclusive backup, which is what
>>> pg_start_backup() does, is not supported off a standby node. Only
>>> pg_basebackup is supported against a standby node.
>>>
>>> Can you reproduce your errors if you make this backup from the master?
>>>
>>
>>
>> pg_start_backup() is run on the master.
>> The rsync copies the data dir of the standby b/c it is local.
>> I have tried rsyncing from the master w/ the same results.
>>
>>
>>>
>>>
>>>
>>>> Also once a week I create a database for backup/archive purposes using
>>>> pg_basebackup
>>>> (whole process takes about 13 hours)
>>>>
>>>>
>>>> These two processes used to be able to coincide until recently.
>>>> Recent changes include a major debian upgrade and a minor version of
>>>> postgres upgrade from 9.3.10.
>>>>
>>>> Now when the report sync occurs during the pg_basebackup (w/
>>>> --xlog-method=stream option) the pg_stop_backup() hangs until
>>>> the the pg_basebackup has completed (4 hours later).
>>>>
>>>
>>> Do you also have an archive_command, and what is it? Normally,
>>> pg_stop_backup() blocks on the archive command - so perhaps you have ended
>>> up with a dependency between that one and the base backup command somehow?
>>>
>>>
>> The archive command is:
>>
>> archive_command = 'rsync -a %p postgres@archivedb
>> :/storage/postgres/9.3/main-archive/pg_xlog/%f'
>>
>> I thought the archive command was working properly while pg_stop_backup
>> was hanging based on the timestamps of the files, but upon closer
>> inspection using stat I see that the files actually were not copied over
>> until pg_stop_backup stopped hanging.
>>
>> Actually it appears that the archive_command is not executed the entire
>> time pg_basebackup is running.
>> pg_basebackup is started about 5 hours before pg_start_backup is executed.
>> Based on stat it looks like the wal files get written locally but do not
>> get rsynced until pg_basebackup is done.
>>
>>
>>
>>>
>>>
>>>> A labeled WAL backup file is created when the pg_stop_backup() is first
>>>> executed and another
>>>> is created when the pg_basebackup completes.
>>>>
>>>>
>>>>
>>>> While the pg_stop_backup() hangs the following appears in the logs:
>>>>
>>>> 2016-06-11 07:50:45 EDT: WARNING:  pg_stop_backup still waiting for all
>>>> required WAL segments to be archived (7680 seconds elapsed)
>>>> 2016-06-11 07:50:45 EDT: HINT:  Check that your archive_command is
>>>> executing properly.  pg_stop_backup can be canceled safely, but the
>>>> database backup will not be usable without all the WAL segments.
>>>>
>>>>
>>>> until eventually:
>>>>
>>>> 2016-06-11 12:59:07 EDT:LOG:  duration: 26190082.497 ms  statement:
>>>> SELECT pg_stop_backup()
>>>>
>>>> The archive command appears to work as WAL files are being archived
>>>> successfully with timestamps that occur while pg_stop_backup is hanging.
>>>>
>>>> -rw------- 1 postgres postgres 314 Jun 11 12:56
>>>> 000000030000109200000066.00000028.backup
>>>> -rw------- 1 postgres postgres 16M Jun 11 12:56 000000030000109300000029
>>>> -rw------- 1 postgres postgres 16M Jun 11 12:54 000000030000109300000028
>>>> -rw------- 1 postgres postgres 16M Jun 11 12:52 000000030000109300000027
>>>> <snip>
>>>> -rw------- 1 postgres postgres 16M Jun 11 06:22 0000000300001092000000F1
>>>> -rw------- 1 postgres postgres 335 Jun 11 05:42
>>>> 0000000300001092000000D4.00000028.backup
>>>> -rw------- 1 postgres postgres 16M Jun 11 05:42 0000000300001092000000F0
>>>>
>>>> Content of 0000000300001092000000D4.00000028.backup:
>>>>
>>>> START WAL LOCATION: 1092/D4000028 (file 0000000300001092000000D4)
>>>> STOP WAL LOCATION: 1092/F008EA10 (file 0000000300001092000000F0)
>>>> CHECKPOINT LOCATION: 1092/D4004E08
>>>> BACKUP METHOD: pg_start_backup
>>>> BACKUP FROM: master
>>>> START TIME: 2016-06-11 04:51:04 EDT
>>>> LABEL: reporting 2016-06-11 04:51:03.892804-04
>>>> STOP TIME: 2016-06-11 05:42:37 EDT
>>>>
>>>
>>> This file shows that the backup took about an hour, doesn't it? So it
>>> doesn't look like it was waiting?
>>>
>>> Or are you saying the wait happens *after* the .backup file has been
>>> dropped in the archives?
>>>
>>
>> The .backup file gets written locally in a timely manner but doesn't get
>> dropped into the archive until after the
>> pg_stop_backup() stops hanging.
>>
>> So it appears that pg_basebackup ( perhaps --xlog-method=stream ? ) is
>> causing the archive_command to pause which in turn
>> causes pg_stop_backup() to hang?
>>
>> Thanks.
>>
>> Alex
>>
>>
>

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

[Attachment #3 (text/html)]

<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jun \
27, 2016 at 8:27 PM, Alex Malek <span dir="ltr">&lt;<a \
href="mailto:magicagent@gmail.com" \
target="_blank">magicagent@gmail.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div dir="ltr"><div><div>Should I file a \
bug?<br></div></div></div></blockquote><div><br></div><div>Oh, sorry. I did not spot \
that this was posted to pgsql-admin, which has a lot fewer readers. I&#39;ve moved it \
over to -hackers where more people will see it.</div><div><br></div><div>(And also, \
please don&#39;t top-post on these lists, as it makes the discussion much harder to \
follow)</div><div><br></div><div>  </div><blockquote class="gmail_quote" \
style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div \
dir="ltr"><div><div><br></div>Does anyone know of a workaround?<br><br></div>I could \
try running pg_basebackup w/ <span><span></span></span><span> --xlog-method=fetch and \
see if that helps but I do like knowing that the base backup will have all needed WAL \
files regardless of WAL retention policy on \
master.<br></span><div><div><br><div>Thanks.<br clear="all"><div><div \
data-smartmail="gmail_signature">Alex <br><br></div><div \
data-smartmail="gmail_signature">Recap of what I *think* is going \
on:<br><br><span><span>1) pg_basebackup w/ </span></span><span> --xlog-method=stream \
is started</span><br>2) archive_command is not executed while \
<span><span>pg_basebackup() is \
running<br></span></span></div></div></div></div></div></div></blockquote><div><br></div><div>There \
is no reason why archive_command should not run during a pg_basebackup run. Do you \
get anything at all in the logfile indicating that this would be the case? If you \
look at the ps output, does the archiver process indicate what it&#39;s doing, and \
does it have a sub-process for your rsync command?</div><div><br></div><div>Is there \
any chance that for example the network or other part of the system gets saturated \
and the archive_command is simply runniung too slow to keep \
up?</div><div><br></div><div><br></div><blockquote class="gmail_quote" \
style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div \
dir="ltr"><div><div><div><div><div \
data-smartmail="gmail_signature"><span><span></span></span></div><div \
data-smartmail="gmail_signature"><span><span>3) while \
</span></span><span><span>pg_basebackup() is running \
</span></span><span><span>pg_start_backup() is executed<br></span></span></div><div \
data-smartmail="gmail_signature"><span><span>4) Later </span></span>pg_stop_backup() \
is executed.<br><span></span>5) pg_stop_backup() hangs related to the archive_command \
not running<br><span><span>6) hours later when pg_basebackup finishes, back logged \
archive commands get executed then </span></span>pg_stop_backup() \
finishes</div></div></div></div></div></div></blockquote><div><br></div><div>This \
would be consistent with (2), but there is no reason why (2) would happen because of \
pg_basebackup unless there&#39;s an actual bug \
there.</div><div><br></div><div>//Magnus</div><div><br></div><div>  </div><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div dir="ltr"><div><div><div><div class="h5"><div \
class="gmail_extra"><div class="gmail_quote">On Mon, Jun 20, 2016 at 10:15 AM, Alex \
Malek <span dir="ltr">&lt;<a href="mailto:magicagent@gmail.com" \
target="_blank">magicagent@gmail.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><div><br></div>Thanks for the \
response Magnus.<br></div>I&#39;ve replied inline below.<br><div><div><br><div \
class="gmail_extra"><div class="gmail_quote"><span>On Mon, Jun 20, 2016 at 4:17 AM, \
Magnus Hagander <span dir="ltr">&lt;<a href="mailto:magnus@hagander.net" \
target="_blank">magnus@hagander.net</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div \
class="gmail_quote"><span>On Mon, Jun 13, 2016 at 6:59 PM, Alex Malek <span \
dir="ltr">&lt;<a href="mailto:magicagent@gmail.com" \
target="_blank">magicagent@gmail.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><br></div><div>I am \
experiencing two problems<br>1) pg_stop_backup hangs until pg_basebackup \
finishes<br></div><div>2) WAL contains references to invalid \
pages<br><br></div><div>(I suspect when 1. is fixed I won&#39;t experience 2. \
)<br><br></div><div>Full description below:<br></div><div><br><br>postgresql: \
9.3.13<br><br></div><div>My situation is I create a writable report database every \
day using pg_start_backup() / rsync / pg_stop_backup()<br></div><div>(whole process \
takes 1-2 hours, the rsync actually copies the data dir of a   slave/warm spare \
DB)<br></div></div></blockquote><div><br></div></span><div>Taking what is called an \
exclusive backup, which is what pg_start_backup() does, is not supported off a \
standby node. Only pg_basebackup is supported against a standby \
node.</div><div><br></div><div>Can you reproduce your errors if you make this backup \
from the master?</div></div></div></div></blockquote></span><div><br><br> \
pg_start_backup() is run on the master.<br></div><div>The rsync copies the data dir \
of the standby b/c it is local.<br></div><div>I have tried rsyncing from the master \
w/ the same results.<br></div><span><div>  </div><blockquote class="gmail_quote" \
style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div \
class="gmail_quote"><span><div><br></div><div>  </div><blockquote class="gmail_quote" \
style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Also once a week I create a \
database for backup/archive purposes using pg_basebackup<br></div><div>(whole process \
takes about 13 hours)</div><br><br><div>These two processes used to be able to \
coincide until recently.<br>Recent changes include a major debian upgrade and a minor \
version of postgres upgrade from 9.3.10.<br><br></div><div>Now when the report sync \
occurs during the pg_basebackup (w/ --xlog-method=stream option) the pg_stop_backup() \
hangs until<br></div><div>the the pg_basebackup has completed (4 hours \
later).<br></div></div></blockquote><div><br></div></span><div>Do you also have an \
archive_command, and what is it? Normally, pg_stop_backup() blocks on the archive \
command - so perhaps you have ended up with a dependency between that one and the \
base backup command somehow?</div><div><div><div><br></div></div></div></div></div></div></blockquote><div><br></div></span><div>The \
archive command is:<br><br>archive_command = &#39;rsync -a %p \
postgres@archivedb:/storage/postgres/9.3/main-archive/pg_xlog/%f&#39;<br><br></div><div>I \
thought the archive command was working properly while pg_stop_backup was hanging \
based on the timestamps of the files, but upon closer<br></div><div>inspection using \
stat I see that the files actually were not copied over until pg_stop_backup stopped \
hanging.<br></div><br></div><div class="gmail_quote">Actually it appears that the \
archive_command is not executed the entire time pg_basebackup is \
running.<br>pg_basebackup is started about 5 hours before pg_start_backup is \
executed.<br></div><div class="gmail_quote">Based on stat it looks like the wal files \
get written locally but do not get rsynced until pg_basebackup is done.<br></div><div \
class="gmail_quote"><div><div><div><br></div><div>  </div><blockquote \
class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div \
class="gmail_quote"><div><div><div></div><div>  </div><blockquote class="gmail_quote" \
style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>A labeled WAL backup file is \
created when the pg_stop_backup() is first executed and another<br></div><div>is \
created when the pg_basebackup completes.<br><br><br><br></div><div>While the  \
pg_stop_backup() hangs the following appears in the logs:<br><br></div><div><font \
size="1"><span style="font-family:monospace,monospace">2016-06-11 07:50:45 EDT: \
WARNING:   pg_stop_backup still waiting for all required WAL segments to be archived \
(7680 seconds elapsed)<br>2016-06-11 07:50:45 EDT: HINT:   Check that your \
archive_command is executing properly.   pg_stop_backup can be canceled safely, but \
the database backup will not be usable without all the WAL \
segments.<br><br></span></font></div><div><br>until \
eventually:<br><br></div><div><font size="1"><span \
style="font-family:monospace,monospace">2016-06-11 12:59:07 EDT:LOG:   duration: \
26190082.497 ms   statement: SELECT \
pg_stop_backup()</span></font><br><br></div><div>The archive command appears to work \
as WAL files are being archived successfully with timestamps that occur while \
pg_stop_backup is hanging.<br><br><font size="1"><span \
style="font-family:monospace,monospace">-rw------- 1 postgres postgres 314 Jun 11 \
12:56 000000030000109200000066.00000028.backup<br>-rw------- 1 postgres postgres 16M \
Jun 11 12:56 000000030000109300000029<br>-rw------- 1 postgres postgres 16M Jun 11 \
12:54 000000030000109300000028<br>-rw------- 1 postgres postgres 16M Jun 11 12:52 \
000000030000109300000027<br></span></font></div><div><span \
style="font-family:monospace,monospace"><font size="1">&lt;snip&gt;<br>-rw------- 1 \
postgres postgres 16M Jun 11 06:22 0000000300001092000000F1<br>-rw------- 1 postgres \
postgres 335 Jun 11 05:42 0000000300001092000000D4.00000028.backup<br>-rw------- 1 \
postgres postgres 16M Jun 11 05:42 \
0000000300001092000000F0</font><br><br></span></div><div><span \
style="font-family:monospace,monospace"><font \
face="arial,helvetica,sans-serif">Content of \
</font>0000000300001092000000D4.00000028.backup:<br><br><font size="1">START WAL \
LOCATION: 1092/D4000028 (file 0000000300001092000000D4)<br>STOP WAL LOCATION: \
1092/F008EA10 (file 0000000300001092000000F0)<br>CHECKPOINT LOCATION: \
1092/D4004E08<br>BACKUP METHOD: pg_start_backup<br>BACKUP FROM: master<br>START TIME: \
2016-06-11 04:51:04 EDT<br>LABEL: reporting 2016-06-11 04:51:03.892804-04<br>STOP \
TIME: 2016-06-11 05:42:37 \
EDT</font></span></div></div></blockquote><div><br></div></div></div><div>This file \
shows that the backup took about an hour, doesn&#39;t it? So it doesn&#39;t look like \
it was waiting?</div><div><br></div><div>Or are you saying the wait happens *after* \
the .backup file has been dropped in the \
archives?</div></div></div></div></blockquote><div><br></div></div></div><div>The \
.backup file gets written locally in a timely manner but doesn&#39;t get dropped into \
the archive until after the<br> pg_stop_backup() stops hanging.<br><br>So it appears \
that pg_basebackup ( perhaps<span> --xlog-method=stream ? ) is causing the \
archive_command to pause which in turn<br></span></div><div><span>causes \
</span>pg_stop_backup() to hang?<br><br></div><div>Thanks.<span><font \
color="#888888"><br></font></span></div><span><font \
color="#888888"><div><br></div><div>Alex<br></div><div><br></div></font></span></div></div></div></div></div>
 </blockquote></div><br></div></div></div></div></div></div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div class="gmail_signature" \
data-smartmail="gmail_signature">  Magnus Hagander<br>  Me: <a \
href="http://www.hagander.net/" target="_blank">http://www.hagander.net/</a><br>  \
Work: <a href="http://www.redpill-linpro.com/" \
target="_blank">http://www.redpill-linpro.com/</a></div> </div></div>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic