[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgresql-hackers
Subject:    Re: race condition when writing pg_control
From:       "Bossart, Nathan" <bossartn () amazon ! com>
Date:       2020-05-31 21:11:35
Message-ID: A9F111A3-560A-4C21-A35B-9D3182704187 () amazon ! com
[Download RAW message or body]

[Attachment #2 (text/plain)]

On 5/29/20, 12:24 AM, "Fujii Masao" <masao.fujii@oss.nttdata.com> wrote:
> On 2020/05/27 16:10, Michael Paquier wrote:
>> On Tue, May 26, 2020 at 07:30:54PM +0000, Bossart, Nathan wrote:
>>> While an assertion in UpdateControlFile() would not have helped us
>>> catch the problem I initially reported, it does seem worthwhile to add
>>> it.  I have attached a patch that adds this assertion and also
>>> attempts to fix XLogReportParameters().  Since there is only one place
>>> where we feel it is safe to call UpdateControlFile() without a lock, I
>>> just changed it to take the lock.  I don't think this adds any sort of
>>> significant contention risk, and IMO it is a bit cleaner than the
>>> boolean flag.
>>
>> Let's see what Fujii-san and Thomas think about that.  I'd rather
>> avoid taking a lock here because we don't need it and because it makes
>> things IMO confusing with the beginning of StartupXLOG() where a lot
>> of the fields are read, even if we go without this extra assertion.
>
> I have no strong opinion about this, but I tend to agree with Michael here.
>
>>> For the XLogReportParameters() fix, I simply added an exclusive lock
>>> acquisition for the portion that updates the values in shared memory
>>> and calls UpdateControlFile().  IIUC the first part of this function
>>> that accesses several ControlFile values should be safe, as none of
>>> them can be updated after server start.
>>
>> They can get updated when replaying a XLOG_PARAMETER_CHANGE record.
>> But you are right as all of this happens in the startup process, so
>> your patch looks right to me here.
>
> LGTM.

Thanks for the feedback.  I've attached a new set of patches.

Nathan


["0003-Assert-that-ControlFileLock-is-held-within-UpdateCon.patch" (application/octet-stream)]

From edc97ec1728c2d05f5d6a0bc6b3a7d665b4b8e6e Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Sun, 31 May 2020 20:48:26 +0000
Subject: [PATCH 3/3] Assert that ControlFileLock is held within
 UpdateControlFile().

Most callers of UpdateControlFile() should acquire ControlFileLock
exclusively beforehand.  This change adds an assertion that the
lock is held.  Callers that do not require the lock can pass a
boolean flag to bypass the assertion.

Back-patch to all supported releases.

Author: Nathan Bossart, based on a suggestion from Michael Paquier
Reviewed-by: Fujii Masao, Michael Paquier
Discussion: https://postgr.es/m/70BF24D6-DC51-443F-B55A-95735803842A%40amazon.com
---
 src/backend/access/transam/xlog.c | 34 ++++++++++++++++++++--------------
 src/include/access/xlog.h         |  2 +-
 2 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 55cac186dc..fe23515e81 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2820,7 +2820,7 @@ UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
 		{
 			ControlFile->minRecoveryPoint = newMinRecoveryPoint;
 			ControlFile->minRecoveryPointTLI = newMinRecoveryPointTLI;
-			UpdateControlFile();
+			UpdateControlFile(true);
 			minRecoveryPoint = newMinRecoveryPoint;
 			minRecoveryPointTLI = newMinRecoveryPointTLI;
 
@@ -4434,7 +4434,7 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
 				 */
 				updateMinRecoveryPoint = true;
 
-				UpdateControlFile();
+				UpdateControlFile(true);
 
 				/*
 				 * We update SharedRecoveryState while holding the lock on
@@ -4896,10 +4896,16 @@ ReadControlFile(void)
 /*
  * Utility wrapper to update the control file.  Note that the control
  * file gets flushed.
+ *
+ * Most callers should acquire ControlFileLock exclusively before calling
+ * this function.  Callers that can safely use this function without holding
+ * the lock should set holds_lock to false.  Otherwise, holds_lock should be
+ * set to true.
  */
 void
-UpdateControlFile(void)
+UpdateControlFile(bool holds_lock)
 {
+	Assert(!holds_lock || LWLockHeldByMeInMode(ControlFileLock, LW_EXCLUSIVE));
 	update_controlfile(DataDir, ControlFile, true);
 }
 
@@ -6956,7 +6962,7 @@ StartupXLOG(void)
 		}
 		ControlFile->time = (pg_time_t) time(NULL);
 		/* No need to hold ControlFileLock yet, we aren't up far enough */
-		UpdateControlFile();
+		UpdateControlFile(false);
 
 		/*
 		 * Initialize our local copy of minRecoveryPoint.  When doing crash
@@ -7940,7 +7946,7 @@ StartupXLOG(void)
 	XLogCtl->SharedRecoveryState = RECOVERY_STATE_DONE;
 	SpinLockRelease(&XLogCtl->info_lck);
 
-	UpdateControlFile();
+	UpdateControlFile(true);
 	LWLockRelease(ControlFileLock);
 
 	/*
@@ -8007,7 +8013,7 @@ CheckRecoveryConsistency(void)
 		ControlFile->backupStartPoint = InvalidXLogRecPtr;
 		ControlFile->backupEndPoint = InvalidXLogRecPtr;
 		ControlFile->backupEndRequired = false;
-		UpdateControlFile();
+		UpdateControlFile(true);
 
 		LWLockRelease(ControlFileLock);
 	}
@@ -8762,7 +8768,7 @@ CreateCheckPoint(int flags)
 		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 		ControlFile->state = DB_SHUTDOWNING;
 		ControlFile->time = (pg_time_t) time(NULL);
-		UpdateControlFile();
+		UpdateControlFile(true);
 		LWLockRelease(ControlFileLock);
 	}
 
@@ -9044,7 +9050,7 @@ CreateCheckPoint(int flags)
 	ControlFile->unloggedLSN = XLogCtl->unloggedLSN;
 	SpinLockRelease(&XLogCtl->ulsn_lck);
 
-	UpdateControlFile();
+	UpdateControlFile(true);
 	LWLockRelease(ControlFileLock);
 
 	/* Update shared-memory copy of checkpoint XID/epoch */
@@ -9153,7 +9159,7 @@ CreateEndOfRecoveryRecord(void)
 	ControlFile->time = (pg_time_t) time(NULL);
 	ControlFile->minRecoveryPoint = recptr;
 	ControlFile->minRecoveryPointTLI = ThisTimeLineID;
-	UpdateControlFile();
+	UpdateControlFile(true);
 	LWLockRelease(ControlFileLock);
 
 	END_CRIT_SECTION();
@@ -9304,7 +9310,7 @@ CreateRestartPoint(int flags)
 			LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 			ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY;
 			ControlFile->time = (pg_time_t) time(NULL);
-			UpdateControlFile();
+			UpdateControlFile(true);
 			LWLockRelease(ControlFileLock);
 		}
 		LWLockRelease(CheckpointLock);
@@ -9386,7 +9392,7 @@ CreateRestartPoint(int flags)
 		}
 		if (flags & CHECKPOINT_IS_SHUTDOWN)
 			ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY;
-		UpdateControlFile();
+		UpdateControlFile(true);
 	}
 	LWLockRelease(ControlFileLock);
 
@@ -9753,7 +9759,7 @@ XLogReportParameters(void)
 		ControlFile->wal_level = wal_level;
 		ControlFile->wal_log_hints = wal_log_hints;
 		ControlFile->track_commit_timestamp = track_commit_timestamp;
-		UpdateControlFile();
+		UpdateControlFile(true);
 
 		LWLockRelease(ControlFileLock);
 	}
@@ -10142,7 +10148,7 @@ xlog_redo(XLogReaderState *record)
 			}
 			ControlFile->backupStartPoint = InvalidXLogRecPtr;
 			ControlFile->backupEndRequired = false;
-			UpdateControlFile();
+			UpdateControlFile(true);
 
 			LWLockRelease(ControlFileLock);
 		}
@@ -10186,7 +10192,7 @@ xlog_redo(XLogReaderState *record)
 								ControlFile->track_commit_timestamp);
 		ControlFile->track_commit_timestamp = xlrec.track_commit_timestamp;
 
-		UpdateControlFile();
+		UpdateControlFile(true);
 		LWLockRelease(ControlFileLock);
 
 		/* Check to see if any parameter change gives a problem on recovery */
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index e917dfe92d..bf1392e6e5 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -312,7 +312,7 @@ extern void SetRecoveryPause(bool recoveryPause);
 extern TimestampTz GetLatestXTime(void);
 extern TimestampTz GetCurrentChunkReplayStartTime(void);
 
-extern void UpdateControlFile(void);
+extern void UpdateControlFile(bool holds_lock);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
 extern bool DataChecksumsEnabled(void);
-- 
2.16.6


["0002-Acquire-ControlFileLock-within-XLogReportParameters.patch" (application/octet-stream)]

From cd4bd8db0b73b445d061f38a09cf377083666062 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Sun, 31 May 2020 19:47:36 +0000
Subject: [PATCH 2/3] Acquire ControlFileLock within XLogReportParameters().

XLogReportParameters() must acquire ControlFileLock before
modifying ControlFile and calling UpdateControlFile(), or the
checkpointer could write out a control file with a bad checksum.

Back-patch to all supported releases.

Reported-by: Fujii Masao
Author: Nathan Bossart
Reviewed-by: Fujii Masao, Michael Paquier
Discussion: https://postgr.es/m/70BF24D6-DC51-443F-B55A-95735803842A%40amazon.com
---
 src/backend/access/transam/xlog.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 274b808476..55cac186dc 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9743,6 +9743,8 @@ XLogReportParameters(void)
 			XLogFlush(recptr);
 		}
 
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
 		ControlFile->MaxConnections = MaxConnections;
 		ControlFile->max_worker_processes = max_worker_processes;
 		ControlFile->max_wal_senders = max_wal_senders;
@@ -9752,6 +9754,8 @@ XLogReportParameters(void)
 		ControlFile->wal_log_hints = wal_log_hints;
 		ControlFile->track_commit_timestamp = track_commit_timestamp;
 		UpdateControlFile();
+
+		LWLockRelease(ControlFileLock);
 	}
 }
 
-- 
2.16.6


["0001-Fix-race-condition-that-could-corrupt-pg_control.patch" (application/octet-stream)]

From 824d2b622eae134cec8f732a29bac69c83f03484 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Fri, 22 May 2020 16:23:59 +1200
Subject: [PATCH 1/3] Fix race condition that could corrupt pg_control.

The redo routines for XLOG_CHECKPOINT_{ONLINE,SHUTDOWN} must acquire
ControlFileLock before modifying ControlFile->checkPointCopy, or the
checkpointer could write out a control file with a bad checksum.

Back-patch to all supported releases.

Author: Nathan Bossart
Reviewed-by: Fujii Masao, Thomas Munro
Discussion: https://postgr.es/m/70BF24D6-DC51-443F-B55A-95735803842A%40amazon.com
---
 src/backend/access/transam/xlog.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ca09d81b08..274b808476 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9976,7 +9976,9 @@ xlog_redo(XLogReaderState *record)
 		}
 
 		/* ControlFile->checkPointCopy always tracks the latest ckpt XID */
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 		ControlFile->checkPointCopy.nextFullXid = checkPoint.nextFullXid;
+		LWLockRelease(ControlFileLock);
 
 		/* Update shared-memory copy of checkpoint XID/epoch */
 		SpinLockAcquire(&XLogCtl->info_lck);
@@ -10033,7 +10035,9 @@ xlog_redo(XLogReaderState *record)
 			SetTransactionIdLimit(checkPoint.oldestXid,
 								  checkPoint.oldestXidDB);
 		/* ControlFile->checkPointCopy always tracks the latest ckpt XID */
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 		ControlFile->checkPointCopy.nextFullXid = checkPoint.nextFullXid;
+		LWLockRelease(ControlFileLock);
 
 		/* Update shared-memory copy of checkpoint XID/epoch */
 		SpinLockAcquire(&XLogCtl->info_lck);
-- 
2.16.6



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic