[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgresql-general
Subject:    Re: [HACKERS] Apparent walsender bug triggered by logical replication
From:       Tom Lane <tgl () sss ! pgh ! pa ! us>
Date:       2017-06-30 22:18:23
Message-ID: 30864.1498861103 () sss ! pgh ! pa ! us
[Download RAW message or body]

------- =_aaaaaaaaaa0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <30730.1498860858.1@sss.pgh.pa.us>

I wrote:
> I've been poking into the src/test/subscription TAP tests, thinking
> that they seem a lot slower than they ought to be.  The first thing
> I came across was this bit in WaitForReplicationWorkerAttach():

>         /*
>          * We need timeout because we generally don't get notified via latch
>          * about the worker attach.
>          */
>         rc = WaitLatch(MyLatch,
>                        WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
>                        1000L, WAIT_EVENT_BGWORKER_STARTUP);

> Tracing showed that the worker is generally done attaching within two
> or three milliseconds of the request, making the 1000ms delay in this
> loop a rather serious killer of startup performance.  I think the right
> way to fix that is to arrange to have a signal sent back from the worker;
> but just to confirm that this *is* a bottleneck, I tried cutting the
> timeout way back.

I found that logicalrep_worker_stop() has pretty much the identical issue:
it needs to wait for a worker to attach or detach, and it isn't going
to get a latch event for that, and it's using 1-second wait quanta for
events that can be expected to occur within milliseconds.

I still think that the "right" fix would involve adding a way to get
a latch signal for attach/detach, but after some investigation it seemed
like a less-than-trivial change.  Since we could do without risky changes
post-beta, I propose that we just reduce these wait parameters to 10ms,
as in the first attached patch.  It may not ever be worth working harder
than that, since these waits only occur when starting/stopping/changing
logical replication.

I found another source of one-second-ish delays, which is that when
activity on the master stops, LogicalRepApplyLoop generally goes to sleep
without having ack'd the last bit of activity it replayed, since that
won't have been flushed by the walwriter yet.  It will send a feedback
message if it times out --- but that takes a whole second.  Maybe longer,
if unrelated wakeups prevent it from seeing WL_TIMEOUT for a few loop
iterations.  In reality, we can expect the last WAL to have been flushed
by the walwriter in, more or less, WalWriterDelay msec.  Therefore, the
second attached patch tweaks the LogicalRepApplyLoop loop to wait at most
WalWriterDelay if it's got unflushed transactions yet to ack to the
sender, and ensures that we will call send_feedback in the loop even if
no new data arrived.

These two changes cut about 15% off the runtime of the subscription tests
for me (compared to where things stand after 1f201a818).  They're probably
less interesting than that for production purposes, but I think it's worth
doing anyway.

			regards, tom lane


------- =_aaaaaaaaaa0
Content-Type: text/x-diff; name="shorten-worker-attach-detach-waits.patch";
	charset="us-ascii"
Content-ID: <30730.1498860858.2@sss.pgh.pa.us>
Content-Description: shorten-worker-attach-detach-waits.patch
Content-Transfer-Encoding: quoted-printable

diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 961110c..d165d51 100644
*** a/src/backend/replication/logical/launcher.c
--- b/src/backend/replication/logical/launcher.c
*************** WaitForReplicationWorkerAttach(LogicalRe
*** 201,211 ****
  
  		/*
  		 * We need timeout because we generally don't get notified via latch
! 		 * about the worker attach.
  		 */
  		rc = WaitLatch(MyLatch,
  					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
! 					   1000L, WAIT_EVENT_BGWORKER_STARTUP);
  
  		/* emergency bailout if postmaster has died */
  		if (rc & WL_POSTMASTER_DEATH)
--- 201,211 ----
  
  		/*
  		 * We need timeout because we generally don't get notified via latch
! 		 * about the worker attach.  But we don't expect to have to wait long.
  		 */
  		rc = WaitLatch(MyLatch,
  					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
! 					   10L, WAIT_EVENT_BGWORKER_STARTUP);
  
  		/* emergency bailout if postmaster has died */
  		if (rc & WL_POSTMASTER_DEATH)
*************** retry:
*** 408,415 ****
  }
  
  /*
!  * Stop the logical replication worker and wait until it detaches from the
!  * slot.
   */
  void
  logicalrep_worker_stop(Oid subid, Oid relid)
--- 408,415 ----
  }
  
  /*
!  * Stop the logical replication worker for subid/relid, if any, and wait until
!  * it detaches from the slot.
   */
  void
  logicalrep_worker_stop(Oid subid, Oid relid)
*************** logicalrep_worker_stop(Oid subid, Oid re
*** 435,442 ****
  	generation = worker->generation;
  
  	/*
! 	 * If we found worker but it does not have proc set it is starting up,
! 	 * wait for it to finish and then kill it.
  	 */
  	while (worker->in_use && !worker->proc)
  	{
--- 435,442 ----
  	generation = worker->generation;
  
  	/*
! 	 * If we found a worker but it does not have proc set then it is still
! 	 * starting up; wait for it to finish starting and then kill it.
  	 */
  	while (worker->in_use && !worker->proc)
  	{
*************** logicalrep_worker_stop(Oid subid, Oid re
*** 444,453 ****
  
  		LWLockRelease(LogicalRepWorkerLock);
  
! 		/* Wait for signal. */
  		rc = WaitLatch(MyLatch,
  					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
! 					   1000L, WAIT_EVENT_BGWORKER_STARTUP);
  
  		/* emergency bailout if postmaster has died */
  		if (rc & WL_POSTMASTER_DEATH)
--- 444,453 ----
  
  		LWLockRelease(LogicalRepWorkerLock);
  
! 		/* Wait a bit --- we don't expect to have to wait long. */
  		rc = WaitLatch(MyLatch,
  					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
! 					   10L, WAIT_EVENT_BGWORKER_STARTUP);
  
  		/* emergency bailout if postmaster has died */
  		if (rc & WL_POSTMASTER_DEATH)
*************** logicalrep_worker_stop(Oid subid, Oid re
*** 459,465 ****
  			CHECK_FOR_INTERRUPTS();
  		}
  
! 		/* Check worker status. */
  		LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
  
  		/*
--- 459,465 ----
  			CHECK_FOR_INTERRUPTS();
  		}
  
! 		/* Recheck worker status. */
  		LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
  
  		/*
*************** logicalrep_worker_stop(Oid subid, Oid re
*** 480,506 ****
  
  	/* Now terminate the worker ... */
  	kill(worker->proc->pid, SIGTERM);
- 	LWLockRelease(LogicalRepWorkerLock);
  
  	/* ... and wait for it to die. */
  	for (;;)
  	{
  		int			rc;
  
! 		LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
  		if (!worker->proc || worker->generation != generation)
- 		{
- 			LWLockRelease(LogicalRepWorkerLock);
  			break;
- 		}
- 		LWLockRelease(LogicalRepWorkerLock);
  
! 		CHECK_FOR_INTERRUPTS();
  
! 		/* Wait for more work. */
  		rc = WaitLatch(MyLatch,
  					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
! 					   1000L, WAIT_EVENT_BGWORKER_SHUTDOWN);
  
  		/* emergency bailout if postmaster has died */
  		if (rc & WL_POSTMASTER_DEATH)
--- 480,501 ----
  
  	/* Now terminate the worker ... */
  	kill(worker->proc->pid, SIGTERM);
  
  	/* ... and wait for it to die. */
  	for (;;)
  	{
  		int			rc;
  
! 		/* is it gone? */
  		if (!worker->proc || worker->generation != generation)
  			break;
  
! 		LWLockRelease(LogicalRepWorkerLock);
  
! 		/* Wait a bit --- we don't expect to have to wait long. */
  		rc = WaitLatch(MyLatch,
  					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
! 					   10L, WAIT_EVENT_BGWORKER_SHUTDOWN);
  
  		/* emergency bailout if postmaster has died */
  		if (rc & WL_POSTMASTER_DEATH)
*************** logicalrep_worker_stop(Oid subid, Oid re
*** 511,517 ****
--- 506,516 ----
  			ResetLatch(MyLatch);
  			CHECK_FOR_INTERRUPTS();
  		}
+ 
+ 		LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
  	}
+ 
+ 	LWLockRelease(LogicalRepWorkerLock);
  }
  
  /*

------- =_aaaaaaaaaa0
Content-Type: text/x-diff; name="speedup-flush-feedback.patch";
	charset="us-ascii"
Content-ID: <30730.1498860858.3@sss.pgh.pa.us>
Content-Description: speedup-flush-feedback.patch
Content-Transfer-Encoding: quoted-printable

diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 898c497..0d48dfa 100644
*** a/src/backend/replication/logical/worker.c
--- b/src/backend/replication/logical/worker.c
***************
*** 52,57 ****
--- 52,58 ----
  
  #include "postmaster/bgworker.h"
  #include "postmaster/postmaster.h"
+ #include "postmaster/walwriter.h"
  
  #include "replication/decode.h"
  #include "replication/logical.h"
*************** LogicalRepApplyLoop(XLogRecPtr last_rece
*** 1027,1032 ****
--- 1028,1034 ----
  		bool		endofstream = false;
  		TimestampTz last_recv_timestamp = GetCurrentTimestamp();
  		bool		ping_sent = false;
+ 		long		wait_time;
  
  		CHECK_FOR_INTERRUPTS();
  
*************** LogicalRepApplyLoop(XLogRecPtr last_rece
*** 1114,1124 ****
  
  				len = walrcv_receive(wrconn, &buf, &fd);
  			}
- 
- 			/* confirm all writes at once */
- 			send_feedback(last_received, false, false);
  		}
  
  		if (!in_remote_transaction)
  		{
  			/*
--- 1116,1126 ----
  
  				len = walrcv_receive(wrconn, &buf, &fd);
  			}
  		}
  
+ 		/* confirm all writes so far */
+ 		send_feedback(last_received, false, false);
+ 
  		if (!in_remote_transaction)
  		{
  			/*
*************** LogicalRepApplyLoop(XLogRecPtr last_rece
*** 1147,1158 ****
  		}
  
  		/*
! 		 * Wait for more data or latch.
  		 */
  		rc = WaitLatchOrSocket(MyLatch,
  							   WL_SOCKET_READABLE | WL_LATCH_SET |
  							   WL_TIMEOUT | WL_POSTMASTER_DEATH,
! 							   fd, NAPTIME_PER_CYCLE,
  							   WAIT_EVENT_LOGICAL_APPLY_MAIN);
  
  		/* Emergency bailout if postmaster has died */
--- 1149,1169 ----
  		}
  
  		/*
! 		 * Wait for more data or latch.  If we have unflushed transactions,
! 		 * wake up after WalWriterDelay to see if they've been flushed yet (in
! 		 * which case we should send a feedback message).  Otherwise, there's
! 		 * no particular urgency about waking up unless we get data or a
! 		 * signal.
  		 */
+ 		if (!dlist_is_empty(&lsn_mapping))
+ 			wait_time = WalWriterDelay;
+ 		else
+ 			wait_time = NAPTIME_PER_CYCLE;
+ 
  		rc = WaitLatchOrSocket(MyLatch,
  							   WL_SOCKET_READABLE | WL_LATCH_SET |
  							   WL_TIMEOUT | WL_POSTMASTER_DEATH,
! 							   fd, wait_time,
  							   WAIT_EVENT_LOGICAL_APPLY_MAIN);
  
  		/* Emergency bailout if postmaster has died */

------- =_aaaaaaaaaa0
Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

------- =_aaaaaaaaaa0--

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic