[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgresql-general
Subject:    Re: [HACKERS] XLogFlush
From:       Jeff Janes <jeff.janes () gmail ! com>
Date:       2009-08-31 15:48:19
Message-ID: f67928030908310848w32a4d4bcr835a7f54cb2b1f91 () mail ! gmail ! com
[Download RAW message or body]

On Fri, Aug 21, 2009 at 1:18 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

> Maybe this is one of those things that is obvious when someone points
> it out to you, but right now I am not seeing it.  If you look at the
> last eight lines of this snippet from XLogFlush, you see that if we
> obtain WriteRqstPtr under the WALInsertLock, then we both write and
> flush up to the highest write request.  But if we obtain it under the
> info_lck, then we write up to the highest write request but flush only
> up to our own records flush request.  Why the disparate treatment?
> The effect of this seems to be that when WALInsertLock is busy, group
> commits are suppressed.
>

I realized I was misinterpreting this.  XLogWrite doesn't just flush up to
WriteRqst.Flush, because fsync doesn't work that way.  If it flushes at all
(which I think it always will when invoked from XLogFlush, as otherwise
XLogFlush would not call it), it will flush up to WriteRqst.Write anyway,
even if WriteRqst.Flush is behind.  So as long as record <= WriteRqst.Flush
<= WriteRqst.Write, then it doesn't matter exactly what WriteRqst.Flush is.
The problem with group commit on a busy WALInsertLock is that if the
xlogctl->LogwrtRqst.Write does get advanced by someone else, it is almost
surely going to be while we are waiting on the WALWriteLock, and so too late
for us to have discovered it when we previously checked under the protection
of info_lck.  We should probably have an else branch on the
LWLockConditionalAcquire so that if it fails, we get the info_lck and check
again for advancement of xlogctl->LogwrtRqst.Write.

But since Simon is doing big changes as part of sync rep, I'll hold off on
doing much experimentation on this until then.



>                LWLockRelease(WALInsertLock);
>                WriteRqst.Write = WriteRqstPtr;
>                WriteRqst.Flush = WriteRqstPtr;
>        }
>        else
>        {
>                WriteRqst.Write = WriteRqstPtr;
>                WriteRqst.Flush = record;
>        }
>

Cheers,

Jeff

[Attachment #3 (text/html)]

<div class="gmail_quote">On Fri, Aug 21, 2009 at 1:18 AM, Jeff Janes <span \
dir="ltr">&lt;<a href="mailto:jeff.janes@gmail.com">jeff.janes@gmail.com</a>&gt;</span> \
wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, \
204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> Maybe this is one of those \
things that is obvious when someone points<br> it out to you, but right now I am not \
seeing it.  If you look at the<br> last eight lines of this snippet from XLogFlush, \
you see that if we<br> obtain WriteRqstPtr under the WALInsertLock, then we both \
write and<br> flush up to the highest write request.  But if we obtain it under \
the<br> info_lck, then we write up to the highest write request but flush only<br>
up to our own records flush request.  Why the disparate treatment?<br>
The effect of this seems to be that when WALInsertLock is busy, group<br>
commits are suppressed.<br></blockquote><div><br>I realized I was misinterpreting \
this.  XLogWrite doesn&#39;t just flush up to WriteRqst.Flush, because fsync \
doesn&#39;t work that way.  If it flushes at all (which I think it always will when \
invoked from XLogFlush, as otherwise XLogFlush would not call it), it will flush up \
to WriteRqst.Write anyway, even if WriteRqst.Flush is behind.  So as long as record \
&lt;= WriteRqst.Flush &lt;= WriteRqst.Write, then it doesn&#39;t matter exactly what \
WriteRqst.Flush is.  The problem with group commit on a busy WALInsertLock is that if \
the xlogctl-&gt;LogwrtRqst.Write does get advanced by someone else, it is almost \
surely going to be while we are waiting on the WALWriteLock, and so too late for us \
to have discovered it when we previously checked under the protection of info_lck.  \
We should probably have an else branch on the LWLockConditionalAcquire so that if it \
fails, we get the info_lck and check again for advancement of \
xlogctl-&gt;LogwrtRqst.Write.  <br> <br>But since Simon is doing big changes as part \
of sync rep, I&#39;ll hold off on doing much experimentation on this until \
then.<br><br> </div><blockquote class="gmail_quote" style="border-left: 1px solid \
rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

               LWLockRelease(WALInsertLock);<br>
                WriteRqst.Write = WriteRqstPtr;<br>
                WriteRqst.Flush = WriteRqstPtr;<br>
        }<br>
        else<br>
        {<br>
                WriteRqst.Write = WriteRqstPtr;<br>
                WriteRqst.Flush = record;<br>
        }<br></blockquote><div><br>Cheers,<br>
<br>
Jeff<br>
 </div></div>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic