[prev in list] [next in list] [prev in thread] [next in thread] 

List:       pgsql-performance
Subject:    Re: Replication lag due to lagging restart_lsn
From:       Kiran Singh <kiranjanarthan24 () gmail ! com>
Date:       2020-08-19 13:15:44
Message-ID: CAJ1hEevDaQoHgcr4eFCzHmyGAQLgkqxY4QRphybe+hJBwqhVTg () mail ! gmail ! com
[Download RAW message or body]

When logical replication is setup, any wal generation on any tables will
result in replication lag. Since you are running a long running transaction
on the master, the maximum number of changes kept in the memory per
transaction is 4MB. If the transaction requires more than 4MB the changes
are spilled to disk. This is when you will start seeing

1. Replication lag spiking
2. Storage being consumed
3. Restart lsn stops moving forward

You can confirm if the heavy write that you are talking about is spilling
to disk or not by setting log_min_messges to debug 2. Try to find if the
changes are spilled to disk.

To answer your question:

1. As long as the write heavy query is running on the database, you will
not see restart lsn moving.
2. You will have to have smaller transactions
3. When the query is completed, you will see restart_lsn moving forward

On Tue, Aug 18, 2020 at 11:27 AM Satyam Shekhar <satyamshekhar@gmail.com>
wrote:

> Hello,
>
> I wish to use logical replication in Postgres to capture transactions as
> CDC and forward them to a custom sink.
>
> To understand the overhead of logical replication workflow I created a toy
> subscriber using the V3PGReplicationStream that acknowledges LSNs after
> every 16k reads by calling setAppliedLsn, setFlushedLsn, and forceUpdateState.
> The toy subscriber is set up as a subscriber for a master Postgres
> instance that publishes changes using a Publication. I then run a
> write-heavy workload on this setup that generates transaction logs at
> approximately 235MBps. Postgres is run on a beefy machine with a 10+GBps
> network link between Postgres and the toy subscriber.
>
> My expectation with this setup was that the replication lag on master
> would be minimal as the subscriber acks the LSN almost immediately.
> However, I observe the replication lag to increase continuously for the
> duration of the test. Statistics in pg_replication_slots show that
> restart_lsn lags significantly behind the confirmed_flushed_lsn. Cursory
> reading on restart_lsn suggests that an increasing gap between restart_lsn
> and confirmed_flushed_lsn means that Postgres needs to reclaim disk space
> and advance restart_lsn to catch up to confirmed_flushed_lsn.
>
> With that context, I am looking for answers for two questions -
>
> 1. What work needs to happen in the database to advance restart_lsn to
> confirmed_flushed_lsn?
> 2. What is the recommendation on tuning the database to improve the
> replication lag in such scenarios?
>
> Regards,
> Satyam
>
>
>

[Attachment #3 (text/html)]

<div><div dir="auto">When logical replication is setup, any wal generation on any \
tables will result in replication lag. Since you are running a long running \
transaction on the master, the maximum number of changes kept in the memory per \
transaction is 4MB. If the transaction requires more than 4MB the changes are spilled \
to disk. This is when you will start seeing  </div><div dir="auto"><br></div><div \
dir="auto">1. Replication lag spiking  </div><div dir="auto">2. Storage being \
consumed</div><div dir="auto">3. Restart lsn stops moving  forward</div><div \
dir="auto"><br></div><div dir="auto">You can confirm if the heavy write that you are \
talking about is spilling to disk or not by setting log_min_messges to debug 2. Try \
to find if the changes are spilled to disk.  </div><div dir="auto"><br></div><div \
dir="auto">To answer your question:  </div><div dir="auto"><br></div><div \
dir="auto">1. As long as the write heavy query is running on the database, you will \
not see restart lsn moving.  </div><div dir="auto">2. You will have to have smaller \
transactions</div><div dir="auto">3. When the query is completed, you will see \
restart_lsn moving forward   </div></div><div><br><div class="gmail_quote"><div \
dir="ltr" class="gmail_attr">On Tue, Aug 18, 2020 at 11:27 AM Satyam Shekhar &lt;<a \
href="mailto:satyamshekhar@gmail.com">satyamshekhar@gmail.com</a>&gt; \
wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)"><div \
dir="ltr">Hello,<div><br></div><div>I wish to  use logical replication  in Postgres \
to capture transactions as CDC and forward them to a custom sink.  \
</div><div><br></div><div>To understand the overhead of logical replication workflow \
I created a toy subscriber using the <font face="monospace" \
style="font-family:monospace;color:rgb(0,0,0)">V3PGReplicationStream </font><font \
face="arial, sans-serif" style="font-family:arial,sans-serif;color:rgb(0,0,0)">that \
acknowledges LSNs after every 16k reads by calling </font><font face="monospace" \
style="font-family:monospace;color:rgb(0,0,0)">setAppliedLsn</font><font face="arial, \
sans-serif" style="font-family:arial,sans-serif;color:rgb(0,0,0)">, </font><font \
face="monospace" style="font-family:monospace;color:rgb(0,0,0)">setFlushedLsn,</font><font \
face="arial, sans-serif" style="font-family:arial,sans-serif;color:rgb(0,0,0)"> and \
</font><font face="monospace" \
style="font-family:monospace;color:rgb(0,0,0)">forceUpdateState. </font><font \
face="arial, sans-serif" style="font-family:arial,sans-serif;color:rgb(0,0,0)">The \
toy subscriber is set up  as a subscriber for a master Postgres instance that \
publishes changes using a </font><font face="monospace" \
style="font-family:monospace;color:rgb(0,0,0)">Publication</font><font face="arial, \
sans-serif" style="font-family:arial,sans-serif;color:rgb(0,0,0)">. I then run a \
write-heavy workload on this setup that generates transaction logs at approximately \
235MBps.  </font><span style="font-family:arial,sans-serif">Postgres is run on a \
beefy machine with a 10+GBps network link between Postgres and the toy subscriber.  \
</span></div><div><font face="arial, sans-serif" \
style="font-family:arial,sans-serif;color:rgb(0,0,0)"><br></font></div><div><font \
face="arial, sans-serif" style="font-family:arial,sans-serif;color:rgb(0,0,0)">My \
expectation with this setup was that the replication lag on master would be minimal \
as the subscriber acks the LSN almost immediately. However, I observe the replication \
lag to increase continuously for the duration of the test. Statistics in </font><font \
face="monospace" style="font-family:monospace;color:rgb(0,0,0)">pg_replication_slots</font><font \
face="arial, sans-serif" style="font-family:arial,sans-serif;color:rgb(0,0,0)">  show \
that  </font><font face="monospace" \
style="font-family:monospace;color:rgb(0,0,0)">restart_lsn</font><font face="arial, \
sans-serif" style="font-family:arial,sans-serif;color:rgb(0,0,0)">  lags  \
significantly  behind the </font><font face="monospace" \
style="font-family:monospace;color:rgb(0,0,0)">confirmed_flushed_lsn</font><font \
face="arial, sans-serif" style="font-family:arial,sans-serif;color:rgb(0,0,0)">.  \
</font>Cursory reading on restart_lsn suggests that an increasing gap between <font \
face="monospace" style="font-family:monospace;color:rgb(0,0,0)">restart_lsn</font> \
and <font face="monospace" \
style="font-family:monospace;color:rgb(0,0,0)">confirmed_flushed_lsn</font> means \
that Postgres needs to reclaim disk space and advance <font face="monospace" \
style="font-family:monospace;color:rgb(0,0,0)">restart_lsn</font> to catch up to \
<font face="monospace" \
style="font-family:monospace;color:rgb(0,0,0)">confirmed_flushed_lsn</font>.  \
</div><div><br></div><div>With that context, I am looking for answers for two \
questions -</div><div><br></div><div>1. What work needs to happen in the database to \
advance restart_lsn to confirmed_flushed_lsn?</div><div>2. What is the recommendation \
on tuning the database to improve the replication lag in such \
scenarios?</div><div><br></div><div>Regards,</div><div>Satyam</div></div><br><br></blockquote></div></div>




[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic