[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgresql-hackers
Subject:    Extensible storage manager API - SMGR hook Redux
From:       Matthias van de Meent <boekewurm+postgres () gmail ! com>
Date:       2023-06-30 12:26:44
Message-ID: CAEze2WgMySu2suO_TLvFyGY3URa4mAx22WeoEicnK=PCNWEMrA () mail ! gmail ! com
[Download RAW message or body]

Hi hackers,

At Neon, we've been working on removing the file system dependency
from PostgreSQL and replacing it with a distributed storage layer. For
now, we've seen most success in this by replacing the implementation
of the smgr API, but it did require some core modifications like those
proposed early last year  by Anastasia [0].

As mentioned in the previous thread, there are several reasons why you
would want to use a non-default storage manager: storage-level
compression, encryption, and disk limit quotas [0]; offloading of cold
relation data was also mentioned [1].

In the thread on Anastasia's patch, Yura Sokolov mentioned that
instead of a hook-based smgr extension, a registration-based smgr
would be preferred, with integration into namespaces. Please find
attached an as of yet incomplete patch that starts to do that.

The patch is yet incomplete (as it isn't derived from Anastasia's
patch), but I would like comments on this regardless, as this is a
fairly fundamental component of PostgreSQL that is being modified, and
it is often better to get comments early in the development cycle. One
significant issue that I've seen so far are that catcache is not
guaranteed to be available in all backends that need to do smgr
operations, and I've not yet found a good solution.

Changes compared to HEAD:
- smgrsw is now dynamically allocated and grows as new storage
managers are loaded (during shared_preload_libraries)
- CREATE TABLESPACE has new optional syntax USING smgrname (option [, ...])
- tablespace storage is (planned) fully managed by smgr through some
new smgr apis

Changes compared to Anastasia's patch:
- extensions do not get to hook and replace the api of the smgr code
directly - they are hidden behind the smgr registry.

Successes:
- 0001 passes tests (make check-world)
- 0002 builds without warnings (make)

TODO:
- fix dependency failures when catcache is unavailable
- tablespace redo is currently broken with 0002
- fix tests for 0002
- ensure that pg_dump etc. works with the new tablespace storage manager options

Looking forward to any comments, suggestions and reviews.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech/)


[0] https://www.postgresql.org/message-id/CAP4vRV6JKXyFfEOf%3Dn%2Bv5RGsZywAQ3CTM8ESWvgq%2BS87Tmgx_g%40mail.gmail.com
 [1] https://www.postgresql.org/message-id/D365F19F-BC3E-4F96-A91E-8DB13049749E@yandex-team.ru



["v1-0001-Expose-f_smgr-to-extensions-for-manual-implementa.patch" (application/octet-stream)]

From bc4f8f9b43dc050ac2fa92d0770eb63c822838b7 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekewurm+postgres@gmail.com>
Date: Tue, 27 Jun 2023 15:59:23 +0200
Subject: [PATCH v1 1/2] Expose f_smgr to extensions for manual implementation

There are various reasons why one would want to create their own
implementation of a storage manager, among which are block-level compression,
encryption and offloading to cold storage. This patch is a first patch that
allows extensions to register their own SMgr.

Note, however, that this SMgr is not yet used - only the first SMgr to register
is used, and this is currently the md.c smgr. Future commits will include
facilities to select an SMgr for each tablespace.
---
 src/backend/postmaster/postmaster.c |   5 +
 src/backend/storage/smgr/md.c       | 164 ++++++++++++++++++----------
 src/backend/storage/smgr/smgr.c     | 126 ++++++++++-----------
 src/backend/utils/init/miscinit.c   |  12 ++
 src/include/miscadmin.h             |   1 +
 src/include/storage/md.h            |   4 +
 src/include/storage/smgr.h          |  56 ++++++++--
 7 files changed, 242 insertions(+), 126 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 4c49393fc5..8685b9fde6 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1002,6 +1002,11 @@ PostmasterMain(int argc, char *argv[])
 	 */
 	ApplyLauncherRegister();
 
+	/*
+	 * Register built-in managers that are not part of static arrays
+	 */
+	register_builtin_dynamic_managers();
+
 	/*
 	 * process any libraries that should be preloaded at postmaster start
 	 */
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 30dbc02f82..690bdd27c5 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -86,6 +86,21 @@ typedef struct _MdfdVec
 } MdfdVec;
 
 static MemoryContext MdCxt;		/* context for all MdfdVec objects */
+SMgrId MdSMgrId;
+
+typedef struct MdSMgrRelationData
+{
+	/* parent data */
+	SMgrRelationData reln;
+	/*
+	 * for md.c; per-fork arrays of the number of open segments
+	 * (md_num_open_segs) and the segments themselves (md_seg_fds).
+	 */
+	int			md_num_open_segs[MAX_FORKNUM + 1];
+	struct _MdfdVec *md_seg_fds[MAX_FORKNUM + 1];
+} MdSMgrRelationData;
+
+typedef MdSMgrRelationData *MdSMgrRelation;
 
 
 /* Populate a file tag describing an md.c segment file. */
@@ -120,26 +135,52 @@ static MemoryContext MdCxt;		/* context for all MdfdVec objects */
 #define EXTENSION_DONT_OPEN			(1 << 5)
 
 
+void mdsmgr_register(void)
+{
+	/* magnetic disk */
+	f_smgr md_smgr = (f_smgr) {
+		.name = "md",
+		.smgr_init = mdinit,
+		.smgr_shutdown = NULL,
+		.smgr_open = mdopen,
+		.smgr_close = mdclose,
+		.smgr_create = mdcreate,
+		.smgr_exists = mdexists,
+		.smgr_unlink = mdunlink,
+		.smgr_extend = mdextend,
+		.smgr_zeroextend = mdzeroextend,
+		.smgr_prefetch = mdprefetch,
+		.smgr_read = mdread,
+		.smgr_write = mdwrite,
+		.smgr_writeback = mdwriteback,
+		.smgr_nblocks = mdnblocks,
+		.smgr_truncate = mdtruncate,
+		.smgr_immedsync = mdimmedsync,
+	};
+
+	MdSMgrId = smgr_register(&md_smgr, sizeof(MdSMgrRelationData));
+}
+
 /* local routines */
 static void mdunlinkfork(RelFileLocatorBackend rlocator, ForkNumber forknum,
 						 bool isRedo);
-static MdfdVec *mdopenfork(SMgrRelation reln, ForkNumber forknum, int behavior);
-static void register_dirty_segment(SMgrRelation reln, ForkNumber forknum,
+static MdfdVec *mdopenfork(MdSMgrRelation reln, ForkNumber forknum, int behavior);
+static void register_dirty_segment(MdSMgrRelation reln, ForkNumber forknum,
 								   MdfdVec *seg);
 static void register_unlink_segment(RelFileLocatorBackend rlocator, ForkNumber forknum,
 									BlockNumber segno);
 static void register_forget_request(RelFileLocatorBackend rlocator, ForkNumber forknum,
 									BlockNumber segno);
-static void _fdvec_resize(SMgrRelation reln,
+static void _fdvec_resize(MdSMgrRelation reln,
 						  ForkNumber forknum,
 						  int nseg);
-static char *_mdfd_segpath(SMgrRelation reln, ForkNumber forknum,
+static char *_mdfd_segpath(MdSMgrRelation reln, ForkNumber forknum,
 						   BlockNumber segno);
-static MdfdVec *_mdfd_openseg(SMgrRelation reln, ForkNumber forknum,
+static MdfdVec *_mdfd_openseg(MdSMgrRelation reln, ForkNumber forknum,
 							  BlockNumber segno, int oflags);
-static MdfdVec *_mdfd_getseg(SMgrRelation reln, ForkNumber forknum,
+static MdfdVec *_mdfd_getseg(MdSMgrRelation reln, ForkNumber forknum,
 							 BlockNumber blkno, bool skipFsync, int behavior);
-static BlockNumber _mdnblocks(SMgrRelation reln, ForkNumber forknum,
+static BlockNumber _mdnblocks(MdSMgrRelation reln, ForkNumber forknum,
 							  MdfdVec *seg);
 
 static inline int
@@ -194,11 +235,13 @@ mdcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo)
 	MdfdVec    *mdfd;
 	char	   *path;
 	File		fd;
+	MdSMgrRelation mdreln = (MdSMgrRelation) reln;
+	Assert(reln->smgr_which == MdSMgrId);
 
-	if (isRedo && reln->md_num_open_segs[forknum] > 0)
+	if (isRedo && mdreln->md_num_open_segs[forknum] > 0)
 		return;					/* created and opened already... */
 
-	Assert(reln->md_num_open_segs[forknum] == 0);
+	Assert(mdreln->md_num_open_segs[forknum] == 0);
 
 	/*
 	 * We may be using the target table space for the first time in this
@@ -235,8 +278,8 @@ mdcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo)
 
 	pfree(path);
 
-	_fdvec_resize(reln, forknum, 1);
-	mdfd = &reln->md_seg_fds[forknum][0];
+	_fdvec_resize(mdreln, forknum, 1);
+	mdfd = &mdreln->md_seg_fds[forknum][0];
 	mdfd->mdfd_vfd = fd;
 	mdfd->mdfd_segno = 0;
 }
@@ -462,6 +505,7 @@ mdextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 	off_t		seekpos;
 	int			nbytes;
 	MdfdVec    *v;
+	MdSMgrRelation mdreln = (MdSMgrRelation) reln;
 
 	/* If this build supports direct I/O, the buffer must be I/O aligned. */
 	if (PG_O_DIRECT != 0 && PG_IO_ALIGN_SIZE <= BLCKSZ)
@@ -485,7 +529,7 @@ mdextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 						relpath(reln->smgr_rlocator, forknum),
 						InvalidBlockNumber)));
 
-	v = _mdfd_getseg(reln, forknum, blocknum, skipFsync, EXTENSION_CREATE);
+	v = _mdfd_getseg(mdreln, forknum, blocknum, skipFsync, EXTENSION_CREATE);
 
 	seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE));
 
@@ -509,9 +553,9 @@ mdextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 	}
 
 	if (!skipFsync && !SmgrIsTemp(reln))
-		register_dirty_segment(reln, forknum, v);
+		register_dirty_segment(mdreln, forknum, v);
 
-	Assert(_mdnblocks(reln, forknum, v) <= ((BlockNumber) RELSEG_SIZE));
+	Assert(_mdnblocks(mdreln, forknum, v) <= ((BlockNumber) RELSEG_SIZE));
 }
 
 /*
@@ -527,6 +571,7 @@ mdzeroextend(SMgrRelation reln, ForkNumber forknum,
 	MdfdVec    *v;
 	BlockNumber curblocknum = blocknum;
 	int			remblocks = nblocks;
+	MdSMgrRelation mdreln = (MdSMgrRelation) reln;
 
 	Assert(nblocks > 0);
 
@@ -558,7 +603,7 @@ mdzeroextend(SMgrRelation reln, ForkNumber forknum,
 		else
 			numblocks = remblocks;
 
-		v = _mdfd_getseg(reln, forknum, curblocknum, skipFsync, EXTENSION_CREATE);
+		v = _mdfd_getseg(mdreln, forknum, curblocknum, skipFsync, EXTENSION_CREATE);
 
 		Assert(segstartblock < RELSEG_SIZE);
 		Assert(segstartblock + numblocks <= RELSEG_SIZE);
@@ -613,9 +658,9 @@ mdzeroextend(SMgrRelation reln, ForkNumber forknum,
 		}
 
 		if (!skipFsync && !SmgrIsTemp(reln))
-			register_dirty_segment(reln, forknum, v);
+			register_dirty_segment(mdreln, forknum, v);
 
-		Assert(_mdnblocks(reln, forknum, v) <= ((BlockNumber) RELSEG_SIZE));
+		Assert(_mdnblocks(mdreln, forknum, v) <= ((BlockNumber) RELSEG_SIZE));
 
 		remblocks -= numblocks;
 		curblocknum += numblocks;
@@ -633,7 +678,7 @@ mdzeroextend(SMgrRelation reln, ForkNumber forknum,
  * invent one out of whole cloth.
  */
 static MdfdVec *
-mdopenfork(SMgrRelation reln, ForkNumber forknum, int behavior)
+mdopenfork(MdSMgrRelation reln, ForkNumber forknum, int behavior)
 {
 	MdfdVec    *mdfd;
 	char	   *path;
@@ -643,7 +688,7 @@ mdopenfork(SMgrRelation reln, ForkNumber forknum, int behavior)
 	if (reln->md_num_open_segs[forknum] > 0)
 		return &reln->md_seg_fds[forknum][0];
 
-	path = relpath(reln->smgr_rlocator, forknum);
+	path = relpath(reln->reln.smgr_rlocator, forknum);
 
 	fd = PathNameOpenFile(path, _mdfd_open_flags());
 
@@ -678,9 +723,10 @@ mdopenfork(SMgrRelation reln, ForkNumber forknum, int behavior)
 void
 mdopen(SMgrRelation reln)
 {
+	MdSMgrRelation mdreln = (MdSMgrRelation) reln;
 	/* mark it not open */
 	for (int forknum = 0; forknum <= MAX_FORKNUM; forknum++)
-		reln->md_num_open_segs[forknum] = 0;
+		mdreln->md_num_open_segs[forknum] = 0;
 }
 
 /*
@@ -689,7 +735,8 @@ mdopen(SMgrRelation reln)
 void
 mdclose(SMgrRelation reln, ForkNumber forknum)
 {
-	int			nopensegs = reln->md_num_open_segs[forknum];
+	MdSMgrRelation mdreln = (MdSMgrRelation) reln;
+	int			nopensegs = mdreln->md_num_open_segs[forknum];
 
 	/* No work if already closed */
 	if (nopensegs == 0)
@@ -698,10 +745,10 @@ mdclose(SMgrRelation reln, ForkNumber forknum)
 	/* close segments starting from the end */
 	while (nopensegs > 0)
 	{
-		MdfdVec    *v = &reln->md_seg_fds[forknum][nopensegs - 1];
+		MdfdVec    *v = &mdreln->md_seg_fds[forknum][nopensegs - 1];
 
 		FileClose(v->mdfd_vfd);
-		_fdvec_resize(reln, forknum, nopensegs - 1);
+		_fdvec_resize(mdreln, forknum, nopensegs - 1);
 		nopensegs--;
 	}
 }
@@ -715,10 +762,11 @@ mdprefetch(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum)
 #ifdef USE_PREFETCH
 	off_t		seekpos;
 	MdfdVec    *v;
+	MdSMgrRelation mdreln = (MdSMgrRelation) reln;
 
 	Assert((io_direct_flags & IO_DIRECT_DATA) == 0);
 
-	v = _mdfd_getseg(reln, forknum, blocknum, false,
+	v = _mdfd_getseg(mdreln, forknum, blocknum, false,
 					 InRecovery ? EXTENSION_RETURN_NULL : EXTENSION_FAIL);
 	if (v == NULL)
 		return false;
@@ -743,6 +791,7 @@ mdread(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 	off_t		seekpos;
 	int			nbytes;
 	MdfdVec    *v;
+	MdSMgrRelation mdreln = (MdSMgrRelation) reln;
 
 	/* If this build supports direct I/O, the buffer must be I/O aligned. */
 	if (PG_O_DIRECT != 0 && PG_IO_ALIGN_SIZE <= BLCKSZ)
@@ -754,7 +803,7 @@ mdread(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 										reln->smgr_rlocator.locator.relNumber,
 										reln->smgr_rlocator.backend);
 
-	v = _mdfd_getseg(reln, forknum, blocknum, false,
+	v = _mdfd_getseg(mdreln, forknum, blocknum, false,
 					 EXTENSION_FAIL | EXTENSION_CREATE_RECOVERY);
 
 	seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE));
@@ -812,6 +861,7 @@ mdwrite(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 	off_t		seekpos;
 	int			nbytes;
 	MdfdVec    *v;
+	MdSMgrRelation mdreln = (MdSMgrRelation) reln;
 
 	/* If this build supports direct I/O, the buffer must be I/O aligned. */
 	if (PG_O_DIRECT != 0 && PG_IO_ALIGN_SIZE <= BLCKSZ)
@@ -828,7 +878,7 @@ mdwrite(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 										 reln->smgr_rlocator.locator.relNumber,
 										 reln->smgr_rlocator.backend);
 
-	v = _mdfd_getseg(reln, forknum, blocknum, skipFsync,
+	v = _mdfd_getseg(mdreln, forknum, blocknum, skipFsync,
 					 EXTENSION_FAIL | EXTENSION_CREATE_RECOVERY);
 
 	seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE));
@@ -863,7 +913,7 @@ mdwrite(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 	}
 
 	if (!skipFsync && !SmgrIsTemp(reln))
-		register_dirty_segment(reln, forknum, v);
+		register_dirty_segment(mdreln, forknum, v);
 }
 
 /*
@@ -876,6 +926,7 @@ void
 mdwriteback(SMgrRelation reln, ForkNumber forknum,
 			BlockNumber blocknum, BlockNumber nblocks)
 {
+	MdSMgrRelation mdreln = (MdSMgrRelation) reln;
 	Assert((io_direct_flags & IO_DIRECT_DATA) == 0);
 
 	/*
@@ -890,7 +941,7 @@ mdwriteback(SMgrRelation reln, ForkNumber forknum,
 		int			segnum_start,
 					segnum_end;
 
-		v = _mdfd_getseg(reln, forknum, blocknum, true /* not used */ ,
+		v = _mdfd_getseg(mdreln, forknum, blocknum, true /* not used */ ,
 						 EXTENSION_DONT_OPEN);
 
 		/*
@@ -937,11 +988,12 @@ mdnblocks(SMgrRelation reln, ForkNumber forknum)
 	MdfdVec    *v;
 	BlockNumber nblocks;
 	BlockNumber segno;
+	MdSMgrRelation mdreln = (MdSMgrRelation) reln;
 
-	mdopenfork(reln, forknum, EXTENSION_FAIL);
+	mdopenfork(mdreln, forknum, EXTENSION_FAIL);
 
 	/* mdopen has opened the first segment */
-	Assert(reln->md_num_open_segs[forknum] > 0);
+	Assert(mdreln->md_num_open_segs[forknum] > 0);
 
 	/*
 	 * Start from the last open segments, to avoid redundant seeks.  We have
@@ -956,12 +1008,12 @@ mdnblocks(SMgrRelation reln, ForkNumber forknum)
 	 * that's OK because the checkpointer never needs to compute relation
 	 * size.)
 	 */
-	segno = reln->md_num_open_segs[forknum] - 1;
-	v = &reln->md_seg_fds[forknum][segno];
+	segno = mdreln->md_num_open_segs[forknum] - 1;
+	v = &mdreln->md_seg_fds[forknum][segno];
 
 	for (;;)
 	{
-		nblocks = _mdnblocks(reln, forknum, v);
+		nblocks = _mdnblocks(mdreln, forknum, v);
 		if (nblocks > ((BlockNumber) RELSEG_SIZE))
 			elog(FATAL, "segment too big");
 		if (nblocks < ((BlockNumber) RELSEG_SIZE))
@@ -979,7 +1031,7 @@ mdnblocks(SMgrRelation reln, ForkNumber forknum)
 		 * undermines _mdfd_getseg's attempts to notice and report an error
 		 * upon access to a missing segment.
 		 */
-		v = _mdfd_openseg(reln, forknum, segno, 0);
+		v = _mdfd_openseg(mdreln, forknum, segno, 0);
 		if (v == NULL)
 			return segno * ((BlockNumber) RELSEG_SIZE);
 	}
@@ -994,6 +1046,7 @@ mdtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
 	BlockNumber curnblk;
 	BlockNumber priorblocks;
 	int			curopensegs;
+	MdSMgrRelation mdreln = (MdSMgrRelation) reln;
 
 	/*
 	 * NOTE: mdnblocks makes sure we have opened all active segments, so that
@@ -1017,14 +1070,14 @@ mdtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
 	 * Truncate segments, starting at the last one. Starting at the end makes
 	 * managing the memory for the fd array easier, should there be errors.
 	 */
-	curopensegs = reln->md_num_open_segs[forknum];
+	curopensegs = mdreln->md_num_open_segs[forknum];
 	while (curopensegs > 0)
 	{
 		MdfdVec    *v;
 
 		priorblocks = (curopensegs - 1) * RELSEG_SIZE;
 
-		v = &reln->md_seg_fds[forknum][curopensegs - 1];
+		v = &mdreln->md_seg_fds[forknum][curopensegs - 1];
 
 		if (priorblocks > nblocks)
 		{
@@ -1039,13 +1092,13 @@ mdtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
 								FilePathName(v->mdfd_vfd))));
 
 			if (!SmgrIsTemp(reln))
-				register_dirty_segment(reln, forknum, v);
+				register_dirty_segment(mdreln, forknum, v);
 
 			/* we never drop the 1st segment */
-			Assert(v != &reln->md_seg_fds[forknum][0]);
+			Assert(v != &mdreln->md_seg_fds[forknum][0]);
 
 			FileClose(v->mdfd_vfd);
-			_fdvec_resize(reln, forknum, curopensegs - 1);
+			_fdvec_resize(mdreln, forknum, curopensegs - 1);
 		}
 		else if (priorblocks + ((BlockNumber) RELSEG_SIZE) > nblocks)
 		{
@@ -1065,7 +1118,7 @@ mdtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
 								FilePathName(v->mdfd_vfd),
 								nblocks)));
 			if (!SmgrIsTemp(reln))
-				register_dirty_segment(reln, forknum, v);
+				register_dirty_segment(mdreln, forknum, v);
 		}
 		else
 		{
@@ -1095,6 +1148,7 @@ mdimmedsync(SMgrRelation reln, ForkNumber forknum)
 {
 	int			segno;
 	int			min_inactive_seg;
+	MdSMgrRelation mdreln = (MdSMgrRelation) reln;
 
 	/*
 	 * NOTE: mdnblocks makes sure we have opened all active segments, so that
@@ -1102,7 +1156,7 @@ mdimmedsync(SMgrRelation reln, ForkNumber forknum)
 	 */
 	mdnblocks(reln, forknum);
 
-	min_inactive_seg = segno = reln->md_num_open_segs[forknum];
+	min_inactive_seg = segno = mdreln->md_num_open_segs[forknum];
 
 	/*
 	 * Temporarily open inactive segments, then close them after sync.  There
@@ -1110,12 +1164,12 @@ mdimmedsync(SMgrRelation reln, ForkNumber forknum)
 	 * is harmless.  We don't bother to clean them up and take a risk of
 	 * further trouble.  The next mdclose() will soon close them.
 	 */
-	while (_mdfd_openseg(reln, forknum, segno, 0) != NULL)
+	while (_mdfd_openseg(mdreln, forknum, segno, 0) != NULL)
 		segno++;
 
 	while (segno > 0)
 	{
-		MdfdVec    *v = &reln->md_seg_fds[forknum][segno - 1];
+		MdfdVec    *v = &mdreln->md_seg_fds[forknum][segno - 1];
 
 		/*
 		 * fsyncs done through mdimmedsync() should be tracked in a separate
@@ -1136,7 +1190,7 @@ mdimmedsync(SMgrRelation reln, ForkNumber forknum)
 		if (segno > min_inactive_seg)
 		{
 			FileClose(v->mdfd_vfd);
-			_fdvec_resize(reln, forknum, segno - 1);
+			_fdvec_resize(mdreln, forknum, segno - 1);
 		}
 
 		segno--;
@@ -1153,14 +1207,14 @@ mdimmedsync(SMgrRelation reln, ForkNumber forknum)
  * enough to be a performance problem).
  */
 static void
-register_dirty_segment(SMgrRelation reln, ForkNumber forknum, MdfdVec *seg)
+register_dirty_segment(MdSMgrRelation reln, ForkNumber forknum, MdfdVec *seg)
 {
 	FileTag		tag;
 
-	INIT_MD_FILETAG(tag, reln->smgr_rlocator.locator, forknum, seg->mdfd_segno);
+	INIT_MD_FILETAG(tag, reln->reln.smgr_rlocator.locator, forknum, seg->mdfd_segno);
 
 	/* Temp relations should never be fsync'd */
-	Assert(!SmgrIsTemp(reln));
+	Assert(!SmgrIsTemp(&reln->reln));
 
 	if (!RegisterSyncRequest(&tag, SYNC_REQUEST, false /* retryOnError */ ))
 	{
@@ -1278,7 +1332,7 @@ DropRelationFiles(RelFileLocator *delrels, int ndelrels, bool isRedo)
  * _fdvec_resize() -- Resize the fork's open segments array
  */
 static void
-_fdvec_resize(SMgrRelation reln,
+_fdvec_resize(MdSMgrRelation reln,
 			  ForkNumber forknum,
 			  int nseg)
 {
@@ -1316,12 +1370,12 @@ _fdvec_resize(SMgrRelation reln,
  * returned string is palloc'd.
  */
 static char *
-_mdfd_segpath(SMgrRelation reln, ForkNumber forknum, BlockNumber segno)
+_mdfd_segpath(MdSMgrRelation reln, ForkNumber forknum, BlockNumber segno)
 {
 	char	   *path,
 			   *fullpath;
 
-	path = relpath(reln->smgr_rlocator, forknum);
+	path = relpath(reln->reln.smgr_rlocator, forknum);
 
 	if (segno > 0)
 	{
@@ -1339,7 +1393,7 @@ _mdfd_segpath(SMgrRelation reln, ForkNumber forknum, BlockNumber segno)
  * and make a MdfdVec object for it.  Returns NULL on failure.
  */
 static MdfdVec *
-_mdfd_openseg(SMgrRelation reln, ForkNumber forknum, BlockNumber segno,
+_mdfd_openseg(MdSMgrRelation reln, ForkNumber forknum, BlockNumber segno,
 			  int oflags)
 {
 	MdfdVec    *v;
@@ -1384,7 +1438,7 @@ _mdfd_openseg(SMgrRelation reln, ForkNumber forknum, BlockNumber segno,
  * EXTENSION_CREATE case.
  */
 static MdfdVec *
-_mdfd_getseg(SMgrRelation reln, ForkNumber forknum, BlockNumber blkno,
+_mdfd_getseg(MdSMgrRelation reln, ForkNumber forknum, BlockNumber blkno,
 			 bool skipFsync, int behavior)
 {
 	MdfdVec    *v;
@@ -1458,7 +1512,7 @@ _mdfd_getseg(SMgrRelation reln, ForkNumber forknum, BlockNumber blkno,
 				char	   *zerobuf = palloc_aligned(BLCKSZ, PG_IO_ALIGN_SIZE,
 													 MCXT_ALLOC_ZERO);
 
-				mdextend(reln, forknum,
+				mdextend((SMgrRelation) reln, forknum,
 						 nextsegno * ((BlockNumber) RELSEG_SIZE) - 1,
 						 zerobuf, skipFsync);
 				pfree(zerobuf);
@@ -1515,7 +1569,7 @@ _mdfd_getseg(SMgrRelation reln, ForkNumber forknum, BlockNumber blkno,
  * Get number of blocks present in a single disk file
  */
 static BlockNumber
-_mdnblocks(SMgrRelation reln, ForkNumber forknum, MdfdVec *seg)
+_mdnblocks(MdSMgrRelation reln, ForkNumber forknum, MdfdVec *seg)
 {
 	off_t		len;
 
@@ -1538,7 +1592,7 @@ _mdnblocks(SMgrRelation reln, ForkNumber forknum, MdfdVec *seg)
 int
 mdsyncfiletag(const FileTag *ftag, char *path)
 {
-	SMgrRelation reln = smgropen(ftag->rlocator, InvalidBackendId);
+	MdSMgrRelation reln = (MdSMgrRelation) smgropen(ftag->rlocator, InvalidBackendId);
 	File		file;
 	instr_time	io_start;
 	bool		need_to_close;
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index f76c4605db..d37202609f 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -19,77 +19,23 @@
 
 #include "access/xlogutils.h"
 #include "lib/ilist.h"
+#include "miscadmin.h"
 #include "storage/bufmgr.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
 #include "storage/md.h"
 #include "storage/smgr.h"
+#include "port/atomics.h"
 #include "utils/hsearch.h"
 #include "utils/inval.h"
+#include "utils/memutils.h"
 
 
-/*
- * This struct of function pointers defines the API between smgr.c and
- * any individual storage manager module.  Note that smgr subfunctions are
- * generally expected to report problems via elog(ERROR).  An exception is
- * that smgr_unlink should use elog(WARNING), rather than erroring out,
- * because we normally unlink relations during post-commit/abort cleanup,
- * and so it's too late to raise an error.  Also, various conditions that
- * would normally be errors should be allowed during bootstrap and/or WAL
- * recovery --- see comments in md.c for details.
- */
-typedef struct f_smgr
-{
-	void		(*smgr_init) (void);	/* may be NULL */
-	void		(*smgr_shutdown) (void);	/* may be NULL */
-	void		(*smgr_open) (SMgrRelation reln);
-	void		(*smgr_close) (SMgrRelation reln, ForkNumber forknum);
-	void		(*smgr_create) (SMgrRelation reln, ForkNumber forknum,
-								bool isRedo);
-	bool		(*smgr_exists) (SMgrRelation reln, ForkNumber forknum);
-	void		(*smgr_unlink) (RelFileLocatorBackend rlocator, ForkNumber forknum,
-								bool isRedo);
-	void		(*smgr_extend) (SMgrRelation reln, ForkNumber forknum,
-								BlockNumber blocknum, const void *buffer, bool skipFsync);
-	void		(*smgr_zeroextend) (SMgrRelation reln, ForkNumber forknum,
-									BlockNumber blocknum, int nblocks, bool skipFsync);
-	bool		(*smgr_prefetch) (SMgrRelation reln, ForkNumber forknum,
-								  BlockNumber blocknum);
-	void		(*smgr_read) (SMgrRelation reln, ForkNumber forknum,
-							  BlockNumber blocknum, void *buffer);
-	void		(*smgr_write) (SMgrRelation reln, ForkNumber forknum,
-							   BlockNumber blocknum, const void *buffer, bool skipFsync);
-	void		(*smgr_writeback) (SMgrRelation reln, ForkNumber forknum,
-								   BlockNumber blocknum, BlockNumber nblocks);
-	BlockNumber (*smgr_nblocks) (SMgrRelation reln, ForkNumber forknum);
-	void		(*smgr_truncate) (SMgrRelation reln, ForkNumber forknum,
-								  BlockNumber nblocks);
-	void		(*smgr_immedsync) (SMgrRelation reln, ForkNumber forknum);
-} f_smgr;
-
-static const f_smgr smgrsw[] = {
-	/* magnetic disk */
-	{
-		.smgr_init = mdinit,
-		.smgr_shutdown = NULL,
-		.smgr_open = mdopen,
-		.smgr_close = mdclose,
-		.smgr_create = mdcreate,
-		.smgr_exists = mdexists,
-		.smgr_unlink = mdunlink,
-		.smgr_extend = mdextend,
-		.smgr_zeroextend = mdzeroextend,
-		.smgr_prefetch = mdprefetch,
-		.smgr_read = mdread,
-		.smgr_write = mdwrite,
-		.smgr_writeback = mdwriteback,
-		.smgr_nblocks = mdnblocks,
-		.smgr_truncate = mdtruncate,
-		.smgr_immedsync = mdimmedsync,
-	}
-};
+static f_smgr *smgrsw;
 
-static const int NSmgr = lengthof(smgrsw);
+static int NSmgr = 0;
+
+static Size LargestSMgrRelationSize = 0;
 
 /*
  * Each backend has a hashtable that stores all extant SMgrRelation objects.
@@ -102,6 +48,57 @@ static dlist_head unowned_relns;
 /* local function prototypes */
 static void smgrshutdown(int code, Datum arg);
 
+SMgrId
+smgr_register(const f_smgr *smgr, Size smgrrelation_size)
+{
+	SMgrId my_id;
+	MemoryContext old;
+
+	if (process_shared_preload_libraries_done)
+		elog(FATAL, "SMgrs must be registered in the shared_preload_libraries phase");
+	if (NSmgr == MaxSMgrId)
+		elog(FATAL, "Too many smgrs registered");
+	if (smgr->name == NULL || *smgr->name == 0)
+		elog(FATAL, "smgr registered with invalid name");
+
+	Assert(smgr->smgr_open != NULL);
+	Assert(smgr->smgr_close != NULL);
+	Assert(smgr->smgr_create != NULL);
+	Assert(smgr->smgr_exists != NULL);
+	Assert(smgr->smgr_unlink != NULL);
+	Assert(smgr->smgr_extend != NULL);
+	Assert(smgr->smgr_zeroextend != NULL);
+	Assert(smgr->smgr_prefetch != NULL);
+	Assert(smgr->smgr_read != NULL);
+	Assert(smgr->smgr_write != NULL);
+	Assert(smgr->smgr_writeback != NULL);
+	Assert(smgr->smgr_nblocks != NULL);
+	Assert(smgr->smgr_truncate != NULL);
+	Assert(smgr->smgr_immedsync != NULL);
+	old = MemoryContextSwitchTo(TopMemoryContext);
+
+	my_id = NSmgr++;
+	if (my_id == 0)
+		smgrsw = palloc(sizeof(f_smgr));
+	else
+		smgrsw = repalloc(smgrsw, sizeof(f_smgr) * NSmgr);
+
+	MemoryContextSwitchTo(old);
+
+	pg_compiler_barrier();
+
+	if (!smgrsw)
+	{
+		NSmgr--;
+		elog(FATAL, "Failed to extend smgr array");
+	}
+
+	memcpy(&smgrsw[my_id], smgr, sizeof(f_smgr));
+
+	LargestSMgrRelationSize = Max(LargestSMgrRelationSize, smgrrelation_size);
+
+	return my_id;
+}
 
 /*
  * smgrinit(), smgrshutdown() -- Initialize or shut down storage
@@ -157,9 +154,11 @@ smgropen(RelFileLocator rlocator, BackendId backend)
 	{
 		/* First time through: initialize the hash table */
 		HASHCTL		ctl;
+		LargestSMgrRelationSize = MAXALIGN(LargestSMgrRelationSize);
+		Assert(NSmgr > 0);
 
 		ctl.keysize = sizeof(RelFileLocatorBackend);
-		ctl.entrysize = sizeof(SMgrRelationData);
+		ctl.entrysize = LargestSMgrRelationSize;
 		SMgrRelationHash = hash_create("smgr relation table", 400,
 									   &ctl, HASH_ELEM | HASH_BLOBS);
 		dlist_init(&unowned_relns);
@@ -180,7 +179,8 @@ smgropen(RelFileLocator rlocator, BackendId backend)
 		reln->smgr_targblock = InvalidBlockNumber;
 		for (int i = 0; i <= MAX_FORKNUM; ++i)
 			reln->smgr_cached_nblocks[i] = InvalidBlockNumber;
-		reln->smgr_which = 0;	/* we only have md.c at present */
+
+		reln->smgr_which = MdSMgrId;	/* we only have md.c at present */
 
 		/* implementation-specific initialization */
 		smgrsw[reln->smgr_which].smgr_open(reln);
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index a604432126..dab4be80c9 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -42,6 +42,7 @@
 #include "postmaster/postmaster.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
+#include "storage/md.h"
 #include "storage/latch.h"
 #include "storage/pg_shmem.h"
 #include "storage/pmsignal.h"
@@ -199,6 +200,9 @@ InitStandaloneProcess(const char *argv0)
 	InitProcessLocalLatch();
 	InitializeLatchWaitSet();
 
+	/* Initialize smgrs */
+	register_builtin_dynamic_managers();
+
 	/*
 	 * For consistency with InitPostmasterChild, initialize signal mask here.
 	 * But we don't unblock SIGQUIT or provide a default handler for it.
@@ -1868,6 +1872,14 @@ process_session_preload_libraries(void)
 				   true);
 }
 
+/*
+ * Register any internal managers.
+ */
+void register_builtin_dynamic_managers(void)
+{
+	mdsmgr_register();
+}
+
 /*
  * process any shared memory requests from preloaded libraries
  */
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 14bd574fc2..8f53b6351c 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -488,6 +488,7 @@ extern void TouchSocketLockFiles(void);
 extern void AddToDataDirLockFile(int target_line, const char *str);
 extern bool RecheckDataDirLockFile(void);
 extern void ValidatePgVersion(const char *path);
+extern void register_builtin_dynamic_managers(void);
 extern void process_shared_preload_libraries(void);
 extern void process_session_preload_libraries(void);
 extern void process_shmem_requests(void);
diff --git a/src/include/storage/md.h b/src/include/storage/md.h
index 941879ee6a..beeddfd373 100644
--- a/src/include/storage/md.h
+++ b/src/include/storage/md.h
@@ -19,6 +19,10 @@
 #include "storage/smgr.h"
 #include "storage/sync.h"
 
+/* registration function for md storage manager */
+extern void mdsmgr_register(void);
+extern SMgrId MdSMgrId;
+
 /* md storage manager functionality */
 extern void mdinit(void);
 extern void mdopen(SMgrRelation reln);
diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h
index a9a179aaba..5ad1d50e0c 100644
--- a/src/include/storage/smgr.h
+++ b/src/include/storage/smgr.h
@@ -18,6 +18,10 @@
 #include "storage/block.h"
 #include "storage/relfilelocator.h"
 
+typedef uint8 SMgrId;
+
+#define MaxSMgrId UINT8_MAX
+
 /*
  * smgr.c maintains a table of SMgrRelation objects, which are essentially
  * cached file handles.  An SMgrRelation is created (if not already present)
@@ -59,14 +63,8 @@ typedef struct SMgrRelationData
 	 * Fields below here are intended to be private to smgr.c and its
 	 * submodules.  Do not touch them from elsewhere.
 	 */
-	int			smgr_which;		/* storage manager selector */
-
-	/*
-	 * for md.c; per-fork arrays of the number of open segments
-	 * (md_num_open_segs) and the segments themselves (md_seg_fds).
-	 */
-	int			md_num_open_segs[MAX_FORKNUM + 1];
-	struct _MdfdVec *md_seg_fds[MAX_FORKNUM + 1];
+	SMgrId		smgr_which;		/* storage manager selector */
+	int			smgrrelation_size;	/* size of this struct, incl. smgr-specific data */
 
 	/* if unowned, list link in list of all unowned SMgrRelations */
 	dlist_node	node;
@@ -77,6 +75,48 @@ typedef SMgrRelationData *SMgrRelation;
 #define SmgrIsTemp(smgr) \
 	RelFileLocatorBackendIsTemp((smgr)->smgr_rlocator)
 
+/*
+ * This struct of function pointers defines the API between smgr.c and
+ * any individual storage manager module.  Note that smgr subfunctions are
+ * generally expected to report problems via elog(ERROR).  An exception is
+ * that smgr_unlink should use elog(WARNING), rather than erroring out,
+ * because we normally unlink relations during post-commit/abort cleanup,
+ * and so it's too late to raise an error.  Also, various conditions that
+ * would normally be errors should be allowed during bootstrap and/or WAL
+ * recovery --- see comments in md.c for details.
+ */
+typedef struct f_smgr
+{
+	const char *name;
+	void		(*smgr_init) (void);		/* may be NULL */
+	void		(*smgr_shutdown) (void);	/* may be NULL */
+	void		(*smgr_open) (SMgrRelation reln);
+	void		(*smgr_close) (SMgrRelation reln, ForkNumber forknum);
+	void		(*smgr_create) (SMgrRelation reln, ForkNumber forknum,
+								bool isRedo);
+	bool		(*smgr_exists) (SMgrRelation reln, ForkNumber forknum);
+	void		(*smgr_unlink) (RelFileLocatorBackend rlocator, ForkNumber forknum,
+								bool isRedo);
+	void		(*smgr_extend) (SMgrRelation reln, ForkNumber forknum,
+								BlockNumber blocknum, const void *buffer, bool skipFsync);
+	void		(*smgr_zeroextend) (SMgrRelation reln, ForkNumber forknum,
+									BlockNumber blocknum, int nblocks, bool skipFsync);
+	bool		(*smgr_prefetch) (SMgrRelation reln, ForkNumber forknum,
+								  BlockNumber blocknum);
+	void		(*smgr_read) (SMgrRelation reln, ForkNumber forknum,
+							  BlockNumber blocknum, void *buffer);
+	void		(*smgr_write) (SMgrRelation reln, ForkNumber forknum,
+							   BlockNumber blocknum, const void *buffer, bool skipFsync);
+	void		(*smgr_writeback) (SMgrRelation reln, ForkNumber forknum,
+								   BlockNumber blocknum, BlockNumber nblocks);
+	BlockNumber (*smgr_nblocks) (SMgrRelation reln, ForkNumber forknum);
+	void		(*smgr_truncate) (SMgrRelation reln, ForkNumber forknum,
+								  BlockNumber nblocks);
+	void		(*smgr_immedsync) (SMgrRelation reln, ForkNumber forknum);
+} f_smgr;
+
+extern SMgrId smgr_register(const f_smgr *smgr, Size smgrrelation_size);
+
 extern void smgrinit(void);
 extern SMgrRelation smgropen(RelFileLocator rlocator, BackendId backend);
 extern bool smgrexists(SMgrRelation reln, ForkNumber forknum);
-- 
2.39.0


["v1-0002-Prototype-Allow-tablespaces-to-specify-which-SMGR.patch" (application/octet-stream)]

From 8db3e73a6fe60c114335a47432a80ecb447b9357 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekewurm+postgres@gmail.com>
Date: Fri, 30 Jun 2023 14:15:36 +0200
Subject: [PATCH v1 2/2] Prototype: Allow tablespaces to specify which SMGR
 they use

This allows for tablespaces that are not present on the local file system.

For now, the default tablespaces (pg_default and pg_global) are still
dependent on the md.c smgr, but in the future this may change as well.
---
 src/backend/access/rmgrdesc/tblspcdesc.c |   2 +-
 src/backend/commands/tablespace.c        | 182 +++----------------
 src/backend/parser/gram.y                |  32 +++-
 src/backend/storage/smgr/md.c            | 214 ++++++++++++++++++++++-
 src/backend/storage/smgr/smgr.c          |  72 +++++++-
 src/backend/utils/cache/spccache.c       |  38 +++-
 src/include/catalog/pg_tablespace.dat    |   6 +-
 src/include/catalog/pg_tablespace.h      |   1 +
 src/include/commands/tablespace.h        |   3 +-
 src/include/nodes/parsenodes.h           |   3 +-
 src/include/storage/md.h                 |  10 ++
 src/include/storage/smgr.h               |  20 ++-
 src/include/utils/spccache.h             |   2 +
 13 files changed, 407 insertions(+), 178 deletions(-)

diff --git a/src/backend/access/rmgrdesc/tblspcdesc.c b/src/backend/access/rmgrdesc/tblspcdesc.c
index b8c89f8c54..04cc15e121 100644
--- a/src/backend/access/rmgrdesc/tblspcdesc.c
+++ b/src/backend/access/rmgrdesc/tblspcdesc.c
@@ -27,7 +27,7 @@ tblspc_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_tblspc_create_rec *xlrec = (xl_tblspc_create_rec *) rec;
 
-		appendStringInfo(buf, "%u \"%s\"", xlrec->ts_id, xlrec->ts_path);
+		appendStringInfo(buf, "%u \"%s\"", xlrec->ts_id, NameStr(xlrec->ts_smgr));
 	}
 	else if (info == XLOG_TBLSPC_DROP)
 	{
diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
index 13b0dee146..b3da4a1b93 100644
--- a/src/backend/commands/tablespace.c
+++ b/src/backend/commands/tablespace.c
@@ -74,6 +74,7 @@
 #include "miscadmin.h"
 #include "postmaster/bgwriter.h"
 #include "storage/fd.h"
+#include "storage/md.h"
 #include "storage/lmgr.h"
 #include "storage/standby.h"
 #include "utils/acl.h"
@@ -92,8 +93,6 @@ bool		allow_in_place_tablespaces = false;
 
 Oid			binary_upgrade_next_pg_tablespace_oid = InvalidOid;
 
-static void create_tablespace_directories(const char *location,
-										  const Oid tablespaceoid);
 static bool destroy_tablespace_directories(Oid tablespaceoid, bool redo);
 
 
@@ -218,10 +217,8 @@ CreateTableSpace(CreateTableSpaceStmt *stmt)
 	bool		nulls[Natts_pg_tablespace] = {0};
 	HeapTuple	tuple;
 	Oid			tablespaceoid;
-	char	   *location;
 	Oid			ownerId;
 	Datum		newOptions;
-	bool		in_place;
 
 	/* Must be superuser */
 	if (!superuser())
@@ -237,47 +234,7 @@ CreateTableSpace(CreateTableSpaceStmt *stmt)
 	else
 		ownerId = GetUserId();
 
-	/* Unix-ify the offered path, and strip any trailing slashes */
-	location = pstrdup(stmt->location);
-	canonicalize_path(location);
-
-	/* disallow quotes, else CREATE DATABASE would be at risk */
-	if (strchr(location, '\''))
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_NAME),
-				 errmsg("tablespace location cannot contain single quotes")));
-
-	in_place = allow_in_place_tablespaces && strlen(location) == 0;
-
-	/*
-	 * Allowing relative paths seems risky
-	 *
-	 * This also helps us ensure that location is not empty or whitespace,
-	 * unless specifying a developer-only in-place tablespace.
-	 */
-	if (!in_place && !is_absolute_path(location))
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
-				 errmsg("tablespace location must be an absolute path")));
-
-	/*
-	 * Check that location isn't too long. Remember that we're going to append
-	 * 'PG_XXX/<dboid>/<relid>_<fork>.<nnn>'.  FYI, we never actually
-	 * reference the whole path here, but MakePGDirectory() uses the first two
-	 * parts.
-	 */
-	if (strlen(location) + 1 + strlen(TABLESPACE_VERSION_DIRECTORY) + 1 +
-		OIDCHARS + 1 + OIDCHARS + 1 + FORKNAMECHARS + 1 + OIDCHARS > MAXPGPATH)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
-				 errmsg("tablespace location \"%s\" is too long",
-						location)));
-
-	/* Warn if the tablespace is in the data directory. */
-	if (path_is_prefix_of_path(DataDir, location))
-		ereport(WARNING,
-				(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
-				 errmsg("tablespace location should not be inside the data directory")));
+	smgrvalidatetspopts(stmt->smgr, stmt->smgropts);
 
 	/*
 	 * Disallow creation of tablespaces named "pg_xxx"; we reserve this
@@ -334,6 +291,8 @@ CreateTableSpace(CreateTableSpaceStmt *stmt)
 	values[Anum_pg_tablespace_oid - 1] = ObjectIdGetDatum(tablespaceoid);
 	values[Anum_pg_tablespace_spcname - 1] =
 		DirectFunctionCall1(namein, CStringGetDatum(stmt->tablespacename));
+	values[Anum_pg_tablespace_spcsmgr - 1] =
+		DirectFunctionCall1(namein, CStringGetDatum(stmt->smgr));
 	values[Anum_pg_tablespace_spcowner - 1] =
 		ObjectIdGetDatum(ownerId);
 	nulls[Anum_pg_tablespace_spcacl - 1] = true;
@@ -360,18 +319,22 @@ CreateTableSpace(CreateTableSpaceStmt *stmt)
 	/* Post creation hook for new tablespace */
 	InvokeObjectPostCreateHook(TableSpaceRelationId, tablespaceoid, 0);
 
-	create_tablespace_directories(location, tablespaceoid);
+	smgrcreatetsp(stmt->smgr, tablespaceoid, stmt->smgropts, 0);
 
 	/* Record the filesystem change in XLOG */
 	{
-		xl_tblspc_create_rec xlrec;
+		xl_tblspc_create_rec xlrec = {0};
+		Datum	smgropts;
 
 		xlrec.ts_id = tablespaceoid;
+		memcpy(&xlrec.ts_smgr, stmt->smgr, strlen(stmt->smgr));
+		smgropts = transformRelOptions((Datum) 0, stmt->smgropts,
+									   NULL, NULL, false, false);
 
 		XLogBeginInsert();
 		XLogRegisterData((char *) &xlrec,
-						 offsetof(xl_tblspc_create_rec, ts_path));
-		XLogRegisterData((char *) location, strlen(location) + 1);
+						 offsetof(xl_tblspc_create_rec, ts_smgropts));
+		XLogRegisterData((char *) smgropts, VARSIZE_ANY(smgropts));
 
 		(void) XLogInsert(RM_TBLSPC_ID, XLOG_TBLSPC_CREATE);
 	}
@@ -384,8 +347,6 @@ CreateTableSpace(CreateTableSpaceStmt *stmt)
 	 */
 	ForceSyncCommit();
 
-	pfree(location);
-
 	/* We keep the lock on pg_tablespace until commit */
 	table_close(rel, NoLock);
 
@@ -401,6 +362,7 @@ void
 DropTableSpace(DropTableSpaceStmt *stmt)
 {
 	char	   *tablespacename = stmt->tablespacename;
+	char	   *smgrname;
 	TableScanDesc scandesc;
 	Relation	rel;
 	HeapTuple	tuple;
@@ -444,6 +406,7 @@ DropTableSpace(DropTableSpaceStmt *stmt)
 
 	spcform = (Form_pg_tablespace) GETSTRUCT(tuple);
 	tablespaceoid = spcform->oid;
+	smgrname = pstrdup(NameStr(spcform->spcsmgr));
 
 	/* Must be tablespace owner */
 	if (!object_ownercheck(TableSpaceRelationId, tablespaceoid, GetUserId()))
@@ -492,6 +455,8 @@ DropTableSpace(DropTableSpaceStmt *stmt)
 	 */
 	LWLockAcquire(TablespaceCreateLock, LW_EXCLUSIVE);
 
+	smgrdroptsp(smgrname, tablespaceoid, false);
+
 	/*
 	 * Try to remove the physical infrastructure.
 	 */
@@ -567,114 +532,6 @@ DropTableSpace(DropTableSpaceStmt *stmt)
 	table_close(rel, NoLock);
 }
 
-
-/*
- * create_tablespace_directories
- *
- *	Attempt to create filesystem infrastructure linking $PGDATA/pg_tblspc/
- *	to the specified directory
- */
-static void
-create_tablespace_directories(const char *location, const Oid tablespaceoid)
-{
-	char	   *linkloc;
-	char	   *location_with_version_dir;
-	struct stat st;
-	bool		in_place;
-
-	linkloc = psprintf("pg_tblspc/%u", tablespaceoid);
-
-	/*
-	 * If we're asked to make an 'in place' tablespace, create the directory
-	 * directly where the symlink would normally go.  This is a developer-only
-	 * option for now, to facilitate regression testing.
-	 */
-	in_place = strlen(location) == 0;
-
-	if (in_place)
-	{
-		if (MakePGDirectory(linkloc) < 0 && errno != EEXIST)
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not create directory \"%s\": %m",
-							linkloc)));
-	}
-
-	location_with_version_dir = psprintf("%s/%s", in_place ? linkloc : location,
-										 TABLESPACE_VERSION_DIRECTORY);
-
-	/*
-	 * Attempt to coerce target directory to safe permissions.  If this fails,
-	 * it doesn't exist or has the wrong owner.  Not needed for in-place mode,
-	 * because in that case we created the directory with the desired
-	 * permissions.
-	 */
-	if (!in_place && chmod(location, pg_dir_create_mode) != 0)
-	{
-		if (errno == ENOENT)
-			ereport(ERROR,
-					(errcode(ERRCODE_UNDEFINED_FILE),
-					 errmsg("directory \"%s\" does not exist", location),
-					 InRecovery ? errhint("Create this directory for the tablespace before "
-										  "restarting the server.") : 0));
-		else
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not set permissions on directory \"%s\": %m",
-							location)));
-	}
-
-	/*
-	 * The creation of the version directory prevents more than one tablespace
-	 * in a single location.  This imitates TablespaceCreateDbspace(), but it
-	 * ignores concurrency and missing parent directories.  The chmod() would
-	 * have failed in the absence of a parent.  pg_tablespace_spcname_index
-	 * prevents concurrency.
-	 */
-	if (stat(location_with_version_dir, &st) < 0)
-	{
-		if (errno != ENOENT)
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not stat directory \"%s\": %m",
-							location_with_version_dir)));
-		else if (MakePGDirectory(location_with_version_dir) < 0)
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not create directory \"%s\": %m",
-							location_with_version_dir)));
-	}
-	else if (!S_ISDIR(st.st_mode))
-		ereport(ERROR,
-				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-				 errmsg("\"%s\" exists but is not a directory",
-						location_with_version_dir)));
-	else if (!InRecovery)
-		ereport(ERROR,
-				(errcode(ERRCODE_OBJECT_IN_USE),
-				 errmsg("directory \"%s\" already in use as a tablespace",
-						location_with_version_dir)));
-
-	/*
-	 * In recovery, remove old symlink, in case it points to the wrong place.
-	 */
-	if (!in_place && InRecovery)
-		remove_tablespace_symlink(linkloc);
-
-	/*
-	 * Create the symlink under PGDATA
-	 */
-	if (!in_place && symlink(location, linkloc) < 0)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not create symbolic link \"%s\": %m",
-						linkloc)));
-
-	pfree(linkloc);
-	pfree(location_with_version_dir);
-}
-
-
 /*
  * destroy_tablespace_directories
  *
@@ -1524,9 +1381,12 @@ tblspc_redo(XLogReaderState *record)
 	if (info == XLOG_TBLSPC_CREATE)
 	{
 		xl_tblspc_create_rec *xlrec = (xl_tblspc_create_rec *) XLogRecGetData(record);
-		char	   *location = xlrec->ts_path;
+		smgrcreatetsp(NameStr(xlrec->ts_smgr), xlrec->ts_id,
+					  untransformRelOptions((Datum) &xlrec->ts_smgropts), true);
 
-		create_tablespace_directories(location, xlrec->ts_id);
+		/*
+		 * create_tablespace_directories(location, xlrec->ts_id);
+		 */
 	}
 	else if (info == XLOG_TBLSPC_DROP)
 	{
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 39ab7eac0d..49742553d4 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -60,6 +60,7 @@
 #include "nodes/nodeFuncs.h"
 #include "parser/parser.h"
 #include "storage/lmgr.h"
+#include "storage/md.h"
 #include "utils/date.h"
 #include "utils/datetime.h"
 #include "utils/numeric.h"
@@ -394,6 +395,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 				opt_inline_handler opt_validator validator_clause
 				opt_collate
 
+%type <node>	OptTableSpaceStorage
+
 %type <range>	qualified_name insert_target OptConstrFromTable
 
 %type <str>		all_Op MathOp
@@ -4931,18 +4934,35 @@ opt_procedural:
 /*****************************************************************************
  *
  *		QUERY:
- *             CREATE TABLESPACE tablespace LOCATION '/path/to/tablespace/'
+ *             CREATE TABLESPACE tablespace
+ *                 [ OWNER role ]
+ *                 [ LOCATION '/path/to/tablespace/' | USING smgr ( option [, ...] ) ]
+ *                 [ WITH ( option [ , ... ] ) ]
  *
  *****************************************************************************/
 
-CreateTableSpaceStmt: CREATE TABLESPACE name OptTableSpaceOwner LOCATION Sconst opt_reloptions
+CreateTableSpaceStmt: CREATE TABLESPACE name OptTableSpaceOwner OptTableSpaceStorage opt_reloptions
 				{
-					CreateTableSpaceStmt *n = makeNode(CreateTableSpaceStmt);
-
+					CreateTableSpaceStmt *n = (CreateTableSpaceStmt *) $5;
 					n->tablespacename = $3;
 					n->owner = $4;
-					n->location = $6;
-					n->options = $7;
+					n->options = $6;
+					$$ = (Node *) n;
+				}
+		;
+
+OptTableSpaceStorage: LOCATION Sconst
+				{
+					CreateTableSpaceStmt *n = makeNode(CreateTableSpaceStmt);
+					n->smgr = MD_SMGR_NAME;
+					n->smgropts = list_make1(makeDefElem("location", (Node *) makeString($2), @1));
+					$$ = (Node *) n;
+				}
+			| USING name '(' utility_option_list ')'
+				{
+					CreateTableSpaceStmt *n = makeNode(CreateTableSpaceStmt);
+					n->smgr = $2;
+					n->smgropts = $4;
 					$$ = (Node *) n;
 				}
 		;
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 690bdd27c5..dfc5a11da4 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -22,12 +22,16 @@
 #include "postgres.h"
 
 #include <unistd.h>
+#include <dirent.h>
 #include <fcntl.h>
 #include <sys/file.h>
+#include <sys/stat.h>
 
 #include "access/xlog.h"
 #include "access/xlogutils.h"
+#include "commands/defrem.h"
 #include "commands/tablespace.h"
+#include "common/file_perm.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
 #include "pgstat.h"
@@ -156,6 +160,9 @@ void mdsmgr_register(void)
 		.smgr_nblocks = mdnblocks,
 		.smgr_truncate = mdtruncate,
 		.smgr_immedsync = mdimmedsync,
+		.smgr_validate_tspopts = mdvalidatetspopts,
+		.smgr_create_tsp = mdcreatetsp,
+		.smgr_drop_tsp = mddroptsp,
 	};
 
 	MdSMgrId = smgr_register(&md_smgr, sizeof(MdSMgrRelationData));
@@ -213,6 +220,7 @@ mdinit(void)
 bool
 mdexists(SMgrRelation reln, ForkNumber forknum)
 {
+	MdSMgrRelation mdreln = (MdSMgrRelation) reln;
 	/*
 	 * Close it first, to ensure that we notice if the fork has been unlinked
 	 * since we opened it.  As an optimization, we can skip that in recovery,
@@ -221,7 +229,7 @@ mdexists(SMgrRelation reln, ForkNumber forknum)
 	if (!InRecovery)
 		mdclose(reln, forknum);
 
-	return (mdopenfork(reln, forknum, EXTENSION_RETURN_NULL) != NULL);
+	return (mdopenfork(mdreln, forknum, EXTENSION_RETURN_NULL) != NULL);
 }
 
 /*
@@ -1672,3 +1680,207 @@ mdfiletagmatches(const FileTag *ftag, const FileTag *candidate)
 	 */
 	return ftag->rlocator.dbOid == candidate->rlocator.dbOid;
 }
+
+void mdvalidatetspopts(List *opts)
+{
+	ListCell   *option;
+	char	   *location;
+	bool		in_place;
+
+	if (list_length(opts) != 1)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_NAME),
+				 errmsg("too many storage options for the %s storage manager", MD_SMGR_NAME),
+				 errhint("Only LOCATION is supported")));
+
+	foreach(option, opts)
+	{
+		DefElem    *defel = lfirst_node(DefElem, option);
+
+		if (strcmp(defel->defname, "location") == 0)
+		{
+			location = pstrdup(defGetString(defel));
+		}
+		else
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("unrecognised option '%s' for to the %s storage manager",
+							defel->defname, MD_SMGR_NAME),
+					 errhint("Only 'location' is supported")),
+					 errposition(defel->location));
+		}
+	}
+
+	/* Unix-ify the offered path, and strip any trailing slashes */
+	canonicalize_path(location);
+
+	/* disallow quotes, else CREATE DATABASE would be at risk */
+	if (strchr(location, '\''))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_NAME),
+					errmsg("tablespace location cannot contain single quotes")));
+
+	in_place = allow_in_place_tablespaces && strlen(location) == 0;
+
+	/*
+	 * Allowing relative paths seems risky
+	 *
+	 * This also helps us ensure that location is not empty or whitespace,
+	 * unless specifying a developer-only in-place tablespace.
+	 */
+	if (!in_place && !is_absolute_path(location))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+					errmsg("tablespace location must be an absolute path")));
+
+	/*
+	 * Check that location isn't too long. Remember that we're going to append
+	 * 'PG_XXX/<dboid>/<relid>_<fork>.<nnn>'.  FYI, we never actually
+	 * reference the whole path here, but MakePGDirectory() uses the first two
+	 * parts.
+	 */
+	if (strlen(location) + 1 + strlen(TABLESPACE_VERSION_DIRECTORY) + 1 +
+		OIDCHARS + 1 + OIDCHARS + 1 + FORKNAMECHARS + 1 + OIDCHARS > MAXPGPATH)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+					errmsg("tablespace location \"%s\" is too long",
+						   location)));
+
+	/* Warn if the tablespace is in the data directory. */
+	if (path_is_prefix_of_path(DataDir, location))
+		ereport(WARNING,
+				(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+					errmsg("tablespace location should not be inside the data directory")));
+
+	pfree(location);
+}
+
+void mdcreatetsp(Oid tablespaceoid, List *opts, bool isredo)
+{
+	char	   *location;
+	DefElem	   *defel = (DefElem *) linitial_node(DefElem, opts);
+
+	Assert(strcmp(defel->defname, "location") == 0);
+	Assert(list_length(opts) == 1);
+
+	location = pstrdup(defGetString(defel));
+
+	/* Unix-ify the offered path, and strip any trailing slashes */
+	canonicalize_path(location);
+
+	create_tablespace_directories(location, tablespaceoid);
+
+	pfree(location);
+}
+
+void mddroptsp(Oid tsp, bool isredo)
+{
+	
+}
+
+/*
+ * create_tablespace_directories
+ *
+ *	Attempt to create filesystem infrastructure linking $PGDATA/pg_tblspc/
+ *	to the specified directory
+ */
+void
+create_tablespace_directories(const char *location, const Oid tablespaceoid)
+{
+	char	   *linkloc;
+	char	   *location_with_version_dir;
+	struct stat st;
+	bool		in_place;
+
+	linkloc = psprintf("pg_tblspc/%u", tablespaceoid);
+
+	/*
+	 * If we're asked to make an 'in place' tablespace, create the directory
+	 * directly where the symlink would normally go.  This is a developer-only
+	 * option for now, to facilitate regression testing.
+	 */
+	in_place = strlen(location) == 0;
+
+	if (in_place)
+	{
+		if (MakePGDirectory(linkloc) < 0 && errno != EEXIST)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+						errmsg("could not create directory \"%s\": %m",
+							   linkloc)));
+	}
+
+	location_with_version_dir = psprintf("%s/%s", in_place ? linkloc : location,
+										 TABLESPACE_VERSION_DIRECTORY);
+
+	/*
+	 * Attempt to coerce target directory to safe permissions.  If this fails,
+	 * it doesn't exist or has the wrong owner.  Not needed for in-place mode,
+	 * because in that case we created the directory with the desired
+	 * permissions.
+	 */
+	if (!in_place && chmod(location, pg_dir_create_mode) != 0)
+	{
+		if (errno == ENOENT)
+			ereport(ERROR,
+					(errcode(ERRCODE_UNDEFINED_FILE),
+						errmsg("directory \"%s\" does not exist", location),
+						InRecovery ? errhint("Create this directory for the tablespace before "
+											 "restarting the server.") : 0));
+		else
+			ereport(ERROR,
+					(errcode_for_file_access(),
+						errmsg("could not set permissions on directory \"%s\": %m",
+							   location)));
+	}
+
+	/*
+	 * The creation of the version directory prevents more than one tablespace
+	 * in a single location.  This imitates TablespaceCreateDbspace(), but it
+	 * ignores concurrency and missing parent directories.  The chmod() would
+	 * have failed in the absence of a parent.  pg_tablespace_spcname_index
+	 * prevents concurrency.
+	 */
+	if (stat(location_with_version_dir, &st) < 0)
+	{
+		if (errno != ENOENT)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+						errmsg("could not stat directory \"%s\": %m",
+							   location_with_version_dir)));
+		else if (MakePGDirectory(location_with_version_dir) < 0)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+						errmsg("could not create directory \"%s\": %m",
+							   location_with_version_dir)));
+	}
+	else if (!S_ISDIR(st.st_mode))
+		ereport(ERROR,
+				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+					errmsg("\"%s\" exists but is not a directory",
+						   location_with_version_dir)));
+	else if (!InRecovery)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_IN_USE),
+					errmsg("directory \"%s\" already in use as a tablespace",
+						   location_with_version_dir)));
+
+	/*
+	 * In recovery, remove old symlink, in case it points to the wrong place.
+	 */
+	if (!in_place && InRecovery)
+		remove_tablespace_symlink(linkloc);
+
+	/*
+	 * Create the symlink under PGDATA
+	 */
+	if (!in_place && symlink(location, linkloc) < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+					errmsg("could not create symbolic link \"%s\": %m",
+						   linkloc)));
+
+	pfree(linkloc);
+	pfree(location_with_version_dir);
+}
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index d37202609f..b5cb720064 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -18,6 +18,7 @@
 #include "postgres.h"
 
 #include "access/xlogutils.h"
+#include "catalog/pg_tablespace_d.h"
 #include "lib/ilist.h"
 #include "miscadmin.h"
 #include "storage/bufmgr.h"
@@ -29,7 +30,7 @@
 #include "utils/hsearch.h"
 #include "utils/inval.h"
 #include "utils/memutils.h"
-
+#include "utils/spccache.h"
 
 static f_smgr *smgrsw;
 
@@ -174,13 +175,25 @@ smgropen(RelFileLocator rlocator, BackendId backend)
 	/* Initialize it if not present before */
 	if (!found)
 	{
+		Oid		tspid = reln->smgr_rlocator.locator.spcOid;
 		/* hash_search already filled in the lookup key */
 		reln->smgr_owner = NULL;
 		reln->smgr_targblock = InvalidBlockNumber;
 		for (int i = 0; i <= MAX_FORKNUM; ++i)
 			reln->smgr_cached_nblocks[i] = InvalidBlockNumber;
 
-		reln->smgr_which = MdSMgrId;	/* we only have md.c at present */
+		/*
+		 * There is a chicken-and-egg problem for determining which storage
+		 * manager to use for the global tablespace, as that holds the
+		 * pg_tablespace table which we'd use to look up this information.
+		 *
+		 * As the global tablespace can't be replaced, the default is used
+		 * instead, which is the md.c smgr (MD_SMGR_NAME).
+		 */
+		if (tspid == GLOBALTABLESPACE_OID || tspid == DEFAULTTABLESPACE_OID)
+			reln->smgr_which = get_smgr_id(MD_SMGR_NAME, false);
+		else
+			reln->smgr_which = get_tablespace_smgrid(tspid);
 
 		/* implementation-specific initialization */
 		smgrsw[reln->smgr_which].smgr_open(reln);
@@ -722,6 +735,61 @@ smgrimmedsync(SMgrRelation reln, ForkNumber forknum)
 	smgrsw[reln->smgr_which].smgr_immedsync(reln, forknum);
 }
 
+static const char *recent_smgrname = NULL;
+static SMgrId recent_smgrid = -1;
+
+static SMgrId get_smgr_by_name(const char *smgrname, bool missing_ok)
+{
+	if (recent_smgrname != NULL && strcmp(smgrname, recent_smgrname) == 0)
+		return recent_smgrid;
+
+	for (SMgrId id = 0; id < NSmgr; id++)
+	{
+		f_smgr *smgr = &smgrsw[id];
+
+		if (strcmp(smgrname, smgr->name) == 0)
+		{
+			recent_smgrname = smgr->name;
+			recent_smgrid = id;
+			return id;
+		}
+	}
+
+	if (missing_ok)
+		return InvalidSmgrId;
+
+	ereport(ERROR,
+			(errcode(ERRCODE_INVALID_NAME),
+			 errmsg("invalid smgr '%s'", smgrname)));
+}
+
+
+SMgrId get_smgr_id(const char *smgrname, bool missing_ok)
+{
+	return get_smgr_by_name(smgrname, missing_ok);
+}
+
+void smgrvalidatetspopts(const char *smgrname, List *opts)
+{
+	SMgrId smgrid = get_smgr_by_name(smgrname, false);
+
+	smgrsw[smgrid].smgr_validate_tspopts(opts);
+}
+
+void smgrcreatetsp(const char *smgrname, Oid tsp, List *opts, bool isredo)
+{
+	SMgrId smgrid = get_smgr_by_name(smgrname, false);
+
+	smgrsw[smgrid].smgr_create_tsp(tsp, opts, isredo);
+}
+
+void smgrdroptsp(const char *smgrname, Oid tsp, bool isredo)
+{
+	SMgrId smgrid = get_smgr_by_name(smgrname, false);
+
+	smgrsw[smgrid].smgr_drop_tsp(tsp, isredo);
+}
+
 /*
  * AtEOXact_SMgr
  *
diff --git a/src/backend/utils/cache/spccache.c b/src/backend/utils/cache/spccache.c
index 136fd737d3..ce7e403b53 100644
--- a/src/backend/utils/cache/spccache.c
+++ b/src/backend/utils/cache/spccache.c
@@ -24,6 +24,8 @@
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
 #include "storage/bufmgr.h"
+#include "storage/smgr.h"
+#include "storage/md.h"
 #include "utils/catcache.h"
 #include "utils/hsearch.h"
 #include "utils/inval.h"
@@ -38,6 +40,7 @@ static HTAB *TableSpaceCacheHash = NULL;
 typedef struct
 {
 	Oid			oid;			/* lookup key - must be first */
+	SMgrId		smgrid;			/* cached storage manager id */
 	TableSpaceOpts *opts;		/* options, or NULL if none */
 } TableSpaceCacheEntry;
 
@@ -98,7 +101,7 @@ InitializeTableSpaceCache(void)
 
 /*
  * get_tablespace
- *		Fetch TableSpaceCacheEntry structure for a specified table OID.
+ *		Fetch TableSpaceCacheEntry structure for a specified tablespace OID.
  *
  * Pointers returned by this function should not be stored, since a cache
  * flush will invalidate them.
@@ -109,6 +112,7 @@ get_tablespace(Oid spcid)
 	TableSpaceCacheEntry *spc;
 	HeapTuple	tp;
 	TableSpaceOpts *opts;
+	SMgrId		smgrid;
 
 	/*
 	 * Since spcid is always from a pg_class tuple, InvalidOid implies the
@@ -135,18 +139,32 @@ get_tablespace(Oid spcid)
 	 */
 	tp = SearchSysCache1(TABLESPACEOID, ObjectIdGetDatum(spcid));
 	if (!HeapTupleIsValid(tp))
+	{
 		opts = NULL;
+		smgrid = InvalidSmgrId;
+	}
 	else
 	{
 		Datum		datum;
 		bool		isNull;
+		char	   *smgrname;
+		
+		smgrname = NameStr(*DatumGetName(SysCacheGetAttr(TABLESPACEOID,
+														 tp,
+														 Anum_pg_tablespace_spcsmgr,
+														 &isNull)));
+
+		Assert(!isNull);
+		smgrid = get_smgr_id(smgrname, false);
 
 		datum = SysCacheGetAttr(TABLESPACEOID,
 								tp,
 								Anum_pg_tablespace_spcoptions,
 								&isNull);
 		if (isNull)
+		{
 			opts = NULL;
+		}
 		else
 		{
 			bytea	   *bytea_opts = tablespace_reloptions(datum, false);
@@ -167,6 +185,8 @@ get_tablespace(Oid spcid)
 											   HASH_ENTER,
 											   NULL);
 	spc->opts = opts;
+	spc->smgrid = smgrid;
+
 	return spc;
 }
 
@@ -235,3 +255,19 @@ get_tablespace_maintenance_io_concurrency(Oid spcid)
 	else
 		return spc->opts->maintenance_io_concurrency;
 }
+
+/*
+ * get_tablespace_smgrid
+ */
+SMgrId
+get_tablespace_smgrid(Oid spcid)
+{
+	TableSpaceCacheEntry *spc;
+	
+	if (spcid == GLOBALTABLESPACE_OID || spcid == DEFAULTTABLESPACE_OID)
+		return get_smgr_id(MD_SMGR_NAME, false);
+
+	spc = get_tablespace(spcid);
+
+	return spc->smgrid;
+}
diff --git a/src/include/catalog/pg_tablespace.dat b/src/include/catalog/pg_tablespace.dat
index 9fbc98a44d..5e20429619 100644
--- a/src/include/catalog/pg_tablespace.dat
+++ b/src/include/catalog/pg_tablespace.dat
@@ -13,8 +13,10 @@
 [
 
 { oid => '1663', oid_symbol => 'DEFAULTTABLESPACE_OID',
-  spcname => 'pg_default', spcacl => '_null_', spcoptions => '_null_' },
+  spcname => 'pg_default', spcacl => '_null_', spcsmgr => 'md',
+  spcoptions => '_null_' },
 { oid => '1664', oid_symbol => 'GLOBALTABLESPACE_OID',
-  spcname => 'pg_global', spcacl => '_null_', spcoptions => '_null_' },
+  spcname => 'pg_global', spcacl => '_null_', spcsmgr => 'md',
+  spcoptions => '_null_' },
 
 ]
diff --git a/src/include/catalog/pg_tablespace.h b/src/include/catalog/pg_tablespace.h
index ea1593d874..9385933c05 100644
--- a/src/include/catalog/pg_tablespace.h
+++ b/src/include/catalog/pg_tablespace.h
@@ -30,6 +30,7 @@ CATALOG(pg_tablespace,1213,TableSpaceRelationId) BKI_SHARED_RELATION
 {
 	Oid			oid;			/* oid */
 	NameData	spcname;		/* tablespace name */
+	NameData	spcsmgr;		/* tablespace storage manager */
 
 	/* owner of tablespace */
 	Oid			spcowner BKI_DEFAULT(POSTGRES) BKI_LOOKUP(pg_authid);
diff --git a/src/include/commands/tablespace.h b/src/include/commands/tablespace.h
index f1961c1813..15220ffb99 100644
--- a/src/include/commands/tablespace.h
+++ b/src/include/commands/tablespace.h
@@ -28,7 +28,8 @@ extern PGDLLIMPORT bool allow_in_place_tablespaces;
 typedef struct xl_tblspc_create_rec
 {
 	Oid			ts_id;
-	char		ts_path[FLEXIBLE_ARRAY_MEMBER]; /* null-terminated string */
+	NameData	ts_smgr;
+	char		ts_smgropts[FLEXIBLE_ARRAY_MEMBER];
 } xl_tblspc_create_rec;
 
 typedef struct xl_tblspc_drop_rec
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index b3bec90e52..e167acec7d 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2613,8 +2613,9 @@ typedef struct CreateTableSpaceStmt
 {
 	NodeTag		type;
 	char	   *tablespacename;
+	char	   *smgr;
+	List	   *smgropts; /* list of DefElem nodes */
 	RoleSpec   *owner;
-	char	   *location;
 	List	   *options;
 } CreateTableSpaceStmt;
 
diff --git a/src/include/storage/md.h b/src/include/storage/md.h
index beeddfd373..a397aa1c10 100644
--- a/src/include/storage/md.h
+++ b/src/include/storage/md.h
@@ -21,6 +21,8 @@
 
 /* registration function for md storage manager */
 extern void mdsmgr_register(void);
+
+#define MD_SMGR_NAME "md"
 extern SMgrId MdSMgrId;
 
 /* md storage manager functionality */
@@ -55,4 +57,12 @@ extern int	mdsyncfiletag(const FileTag *ftag, char *path);
 extern int	mdunlinkfiletag(const FileTag *ftag, char *path);
 extern bool mdfiletagmatches(const FileTag *ftag, const FileTag *candidate);
 
+/* md tsp callbacks */
+extern void mdvalidatetspopts(List *opts);
+extern void mdcreatetsp(Oid tsp, List *opts, bool isredo);
+extern void mddroptsp(Oid tsp, bool isredo);
+void create_tablespace_directories(const char *location,
+								   const Oid tablespaceoid);
+
+
 #endif							/* MD_H */
diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h
index 5ad1d50e0c..12a9b5f00e 100644
--- a/src/include/storage/smgr.h
+++ b/src/include/storage/smgr.h
@@ -15,12 +15,18 @@
 #define SMGR_H
 
 #include "lib/ilist.h"
+#include "nodes/pg_list.h"
 #include "storage/block.h"
 #include "storage/relfilelocator.h"
 
-typedef uint8 SMgrId;
+/*
+ * volatile ID of the smgr. Across various configurations IDs may vary,
+ * true identity is the name of each smgr. 
+ */
+typedef int SMgrId;
 
-#define MaxSMgrId UINT8_MAX
+#define MaxSMgrId		INT_MAX
+#define InvalidSmgrId	(-1)
 
 /*
  * smgr.c maintains a table of SMgrRelation objects, which are essentially
@@ -113,8 +119,13 @@ typedef struct f_smgr
 	void		(*smgr_truncate) (SMgrRelation reln, ForkNumber forknum,
 								  BlockNumber nblocks);
 	void		(*smgr_immedsync) (SMgrRelation reln, ForkNumber forknum);
+
+	void		(*smgr_validate_tspopts) (List *tspopts);
+	void		(*smgr_create_tsp) (Oid tspoid, List *tspopts, bool isredo);
+	void		(*smgr_drop_tsp) (Oid tspoid, bool isredo);
 } f_smgr;
 
+extern SMgrId get_smgr_id(const char *smgrname, bool missing_ok);
 extern SMgrId smgr_register(const f_smgr *smgr, Size smgrrelation_size);
 
 extern void smgrinit(void);
@@ -147,6 +158,11 @@ extern BlockNumber smgrnblocks_cached(SMgrRelation reln, ForkNumber forknum);
 extern void smgrtruncate(SMgrRelation reln, ForkNumber *forknum,
 						 int nforks, BlockNumber *nblocks);
 extern void smgrimmedsync(SMgrRelation reln, ForkNumber forknum);
+
+extern void smgrvalidatetspopts(const char *smgrname, List *opts);
+extern void smgrcreatetsp(const char *smgrname, Oid tsp, List *opts, bool isredo);
+extern void smgrdroptsp(const char *smgrname, Oid tsp, bool isredo);
+
 extern void AtEOXact_SMgr(void);
 extern bool ProcessBarrierSmgrRelease(void);
 
diff --git a/src/include/utils/spccache.h b/src/include/utils/spccache.h
index c6c754a2ec..6569452e91 100644
--- a/src/include/utils/spccache.h
+++ b/src/include/utils/spccache.h
@@ -12,10 +12,12 @@
  */
 #ifndef SPCCACHE_H
 #define SPCCACHE_H
+#include "storage/smgr.h"
 
 extern void get_tablespace_page_costs(Oid spcid, float8 *spc_random_page_cost,
 									  float8 *spc_seq_page_cost);
 extern int	get_tablespace_io_concurrency(Oid spcid);
 extern int	get_tablespace_maintenance_io_concurrency(Oid spcid);
+extern SMgrId get_tablespace_smgrid(Oid spcid);
 
 #endif							/* SPCCACHE_H */
-- 
2.39.0



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic