'Re: [PERFORM] Strange query stalls on replica in 9.3.9'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       pgsql-performance
Subject:    Re: [PERFORM] Strange query stalls on replica in 9.3.9
From:       Jeff Janes <jeff.janes () gmail ! com>
Date:       2015-08-18 17:44:41
Message-ID: CAMkU=1yewz7mZ37EEh_98pc3p3PKL9eKNWxLFmaC2zhrYyZTnA () mail ! gmail ! com
[Download RAW message or body]

On Fri, Aug 14, 2015 at 9:54 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

> On Fri, Aug 14, 2015 at 9:34 AM, Josh Berkus <josh@agliodbs.com> wrote:
>
>>
>> On 08/13/2015 01:59 PM, Jeff Janes wrote: execute on the
>>
>> > Once the commit of the whole-table update has replayed, the problem
>> > should go way instantly because at that point each backend doing the
>> > seqscan will find the the transaction has committed and so will set the
>> > hint bit that means all of the other seqscan backends that come after it
>> > can skip the proc array scan for that tuple.
>>
>> Yes ... and given that the commit on the master took < 3 seconds, it's
>> not likely to take 30 seconds on the replica.  That aside, the pattern
>> of behavior does look similar to the planner issue.
>>
>
> Another thought.  Who actually sets the hint bits on a replica?
>
> Do the read-only processes on the replica which discovers a tuple to have
> been securely committed set the hint bits?
>
> My benchmarking suggests not.
>

The hint bits only get set if the commit lsn of the transaction of the
tuple being hinted (*not* the page lsn) thinks it has already been flushed
to WAL. On master the transaction commit record usually would have already
flushed its own WAL, or if async then wal writer is going to take care of
this fairly soon if nothing else gets to it first.

On the standby, it looks like the only thing that updates the
thinks-it-has-been-flushed-to marker (which is stored in the control file,
rather than memory) is either the eviction of a dirty buffer, or the
completion of a restartpoint.  I could easily be wrong on that, though.

In any case, you empirically can have committed but unhintable tuples
hanging around for prolonged amounts of time on the standby.  Perhaps
standbys need a wal writer process.

Cheers,

Jeff

[Attachment #3 (text/html)]

<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Aug 14, \
2015 at 9:54 AM, Jeff Janes <span dir="ltr">&lt;<a href="mailto:jeff.janes@gmail.com" \
target="_blank">jeff.janes@gmail.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div \
class="gmail_quote"><span class="">On Fri, Aug 14, 2015 at 9:34 AM, Josh Berkus <span \
dir="ltr">&lt;<a href="mailto:josh@agliodbs.com" \
target="_blank">josh@agliodbs.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><br> On 08/13/2015 01:59 PM, Jeff Janes wrote: execute on \
the<br> <span><br>
&gt; Once the commit of the whole-table update has replayed, the problem<br>
&gt; should go way instantly because at that point each backend doing the<br>
&gt; seqscan will find the the transaction has committed and so will set the<br>
&gt; hint bit that means all of the other seqscan backends that come after it<br>
&gt; can skip the proc array scan for that tuple.<br>
<br>
</span>Yes ... and given that the commit on the master took &lt; 3 seconds, \
it&#39;s<br> not likely to take 30 seconds on the replica.   That aside, the \
pattern<br> of behavior does look similar to the planner \
issue.<br></blockquote><div><br></div></span><div>Another thought.   Who actually \
sets the hint bits on a replica?   </div><div><br></div><div>Do the read-only \
processes on the replica which discovers a tuple to have been securely committed set \
the hint bits?</div><div><br></div><div>My benchmarking suggests \
not.</div></div></div></div></blockquote><div><br></div><div>The hint bits only get \
set if the commit lsn of the transaction of the tuple being hinted (*not* the page \
lsn) thinks it has already been flushed to WAL. On master the transaction commit \
record usually would have already flushed its own WAL, or if async then wal writer is \
going to take care of this fairly soon if nothing else gets to it \
first.</div><div><br></div><div>On the standby, it looks like the only thing that \
updates the thinks-it-has-been-flushed-to marker (which is stored in the control \
file, rather than memory) is either the eviction of a dirty buffer, or the completion \
of a restartpoint.   I could easily be wrong on that, though.   \
</div><div><br></div><div>In any case, you empirically can have committed but \
unhintable tuples hanging around for prolonged amounts of time on the standby.   \
Perhaps standbys need a wal writer \
process.</div><div><br></div><div>Cheers,<br></div><div><br></div><div>Jeff</div></div></div></div>




[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic