[prev in list] [next in list] [prev in thread] [next in thread] 

List:       pgsql-hackers
Subject:    [HACKERS] WAL log only necessary part of 2PC GID
From:       Pavan Deolasee <pavan.deolasee () gmail ! com>
Date:       2016-02-29 13:57:47
Message-ID: CABOikdOkhZQnHxV_P8-gnrfJ5Pg3JcrVootod98khFMs=wtiTA () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


Hello Hackers,

The maximum size of the GID, used as a 2PC identifier is currently defined
as 200 bytes (see src/backend/access/transam/twophase.c). The actual GID
used by the applications though may be much smaller than that. So IMO
instead of WAL logging the entire 200 bytes during PREPARE TRANSACTION, we
should just WAL log strlen(gid) bytes.

The attached patch does that. The changes are limited to twophase.c and
some simple crash recovery tests seem to be work ok. In terms of
performance, a quick test shows marginal improvement in tps using the
script that Stas Kelvich used for his work on speeding up twophase
transactions. The only change I made is to keep the :scale unchanged
because increasing the :scale in every iteration will result in only a
handful updates (not sure why Stas had that in his original script)

\set naccounts 100000 * :scale
\setrandom from_aid 1 :naccounts
\setrandom to_aid 1 :naccounts
\setrandom delta 1 100
BEGIN;
UPDATE pgbench_accounts SET abalance = abalance - :delta WHERE aid =
:from_aid;
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid =
:to_aid;
PREPARE TRANSACTION ':client_id.:scale';
COMMIT PREPARED ':client_id.:scale';

The amount of WAL generated during a 60s run shows a decline of about 25%
with default settings except full_page_writes which is turned off.

HEAD: 861 WAL bytes / transaction
PATCH: 670 WAL bytes / transaction

Actually, the above numbers probably include a lot of WAL generated because
of HOT pruning and page defragmentation. If we just look at the WAL
overhead caused by 2PC, the decline is somewhere close to 50%. I took
numbers using simple 1PC for reference and to understand the overhead of
2PC.

HEAD (1PC): 382 bytes / transaction

Thanks,
Pavan

-- 
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

[Attachment #5 (text/html)]

<div dir="ltr"><div><div>Hello Hackers,</div><div><br></div><div>The maximum size of \
the GID, used as a 2PC identifier is currently defined as 200 bytes (see  \
src/backend/access/transam/twophase.c). The actual GID used by the applications \
though may be much smaller than that. So IMO instead of WAL logging the entire 200 \
bytes during PREPARE TRANSACTION, we should just WAL log strlen(gid) \
bytes.</div><div><br></div><div>The attached patch does that. The changes are limited \
to twophase.c and some simple crash recovery tests seem to be work ok. In terms of \
performance, a quick test shows marginal improvement in tps using the script that \
Stas Kelvich used for his work on speeding up twophase transactions. The only change \
I made is to keep the :scale unchanged because increasing the :scale in every \
iteration will result in only a handful updates (not sure why Stas had that in his \
original script)</div><div><br></div><div><span style="font-size:12.8px">\set \
naccounts 100000 * :scale</span><br style="font-size:12.8px"><span \
style="font-size:12.8px">\setrandom from_aid 1 :naccounts</span><br \
style="font-size:12.8px"><span style="font-size:12.8px">\setrandom to_aid 1 \
:naccounts</span><br style="font-size:12.8px"><span \
style="font-size:12.8px">\setrandom delta 1 100</span><br \
style="font-size:12.8px"><span style="font-size:12.8px">BEGIN;</span><br \
style="font-size:12.8px"><span style="font-size:12.8px">UPDATE pgbench_accounts SET \
abalance = abalance - :delta WHERE aid = :from_aid;</span><br \
style="font-size:12.8px"><span style="font-size:12.8px">UPDATE pgbench_accounts SET \
abalance = abalance + :delta WHERE aid = :to_aid;</span><br \
style="font-size:12.8px"><span style="font-size:12.8px">PREPARE TRANSACTION \
&#39;:client_id.:scale&#39;;</span><br style="font-size:12.8px"><span \
style="font-size:12.8px">COMMIT PREPARED \
&#39;:client_id.:scale&#39;;</span><br></div><div><br></div><div>The amount of WAL \
generated during a 60s run shows a decline of about 25% with default settings except \
full_page_writes which is turned off.</div><div><br></div><div>HEAD: 861 WAL bytes / \
transaction<br></div><div>PATCH: 670 WAL bytes / \
transaction</div><div><br></div><div>Actually, the above numbers probably include a \
lot of WAL generated because of HOT pruning and page defragmentation. If we just look \
at the WAL overhead caused by 2PC, the decline is somewhere close to 50%. I took \
numbers using simple 1PC for reference and to understand the overhead of \
2PC.</div><div><br></div><div><div>HEAD (1PC): 382 bytes / \
transaction</div></div><div><br></div><div>Thanks,</div></div><div>Pavan</div><div><br></div>-- \
<br><div class="gmail_signature"><div dir="ltr"><div><div dir="ltr">  Pavan Deolasee  \
<a href="http://www.2ndQuadrant.com/" \
target="_blank">http://www.2ndQuadrant.com/</a><br>  PostgreSQL Development, 24x7 \
Support, Training &amp; Services</div></div></div></div> </div>


["reduce_gid_wal.patch" (application/octet-stream)]

diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 8a22836..e4e88b6 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -866,7 +866,7 @@ typedef struct TwoPhaseFileHeader
 	int32		nabortrels;		/* number of delete-on-abort rels */
 	int32		ninvalmsgs;		/* number of cache invalidation messages */
 	bool		initfileinval;	/* does relcache init file need invalidation? */
-	char		gid[GIDSIZE];	/* GID for transaction */
+	uint32		gidlen;			/* length of the GID - GID follows the header */
 } TwoPhaseFileHeader;
 
 /*
@@ -977,9 +977,10 @@ StartPrepare(GlobalTransaction gxact)
 	hdr.nabortrels = smgrGetPendingDeletes(false, &abortrels);
 	hdr.ninvalmsgs = xactGetCommittedInvalidationMessages(&invalmsgs,
 														  &hdr.initfileinval);
-	StrNCpy(hdr.gid, gxact->gid, GIDSIZE);
+	hdr.gidlen = strlen(gxact->gid) + 1; /* Include '\0' */
 
 	save_state_data(&hdr, sizeof(TwoPhaseFileHeader));
+	save_state_data(gxact->gid, hdr.gidlen);
 
 	/*
 	 * Add the additional info about subxacts, deletable files and cache
@@ -1360,6 +1361,7 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
 	hdr = (TwoPhaseFileHeader *) buf;
 	Assert(TransactionIdEquals(hdr->xid, xid));
 	bufptr = buf + MAXALIGN(sizeof(TwoPhaseFileHeader));
+	bufptr += MAXALIGN(hdr->gidlen);
 	children = (TransactionId *) bufptr;
 	bufptr += MAXALIGN(hdr->nsubxacts * sizeof(TransactionId));
 	commitrels = (RelFileNode *) bufptr;
@@ -1915,6 +1917,7 @@ RecoverPreparedTransactions(void)
 			TwoPhaseFileHeader *hdr;
 			TransactionId *subxids;
 			GlobalTransaction gxact;
+			const char	*gid;
 			int			i;
 
 			xid = (TransactionId) strtoul(clde->d_name, NULL, 16);
@@ -1947,6 +1950,8 @@ RecoverPreparedTransactions(void)
 			hdr = (TwoPhaseFileHeader *) buf;
 			Assert(TransactionIdEquals(hdr->xid, xid));
 			bufptr = buf + MAXALIGN(sizeof(TwoPhaseFileHeader));
+			gid = (const char *) bufptr;
+			bufptr += MAXALIGN(hdr->gidlen);
 			subxids = (TransactionId *) bufptr;
 			bufptr += MAXALIGN(hdr->nsubxacts * sizeof(TransactionId));
 			bufptr += MAXALIGN(hdr->ncommitrels * sizeof(RelFileNode));
@@ -1975,7 +1980,7 @@ RecoverPreparedTransactions(void)
 			/*
 			 * Recreate its GXACT and dummy PGPROC
 			 */
-			gxact = MarkAsPreparing(xid, hdr->gid,
+			gxact = MarkAsPreparing(xid, gid,
 									hdr->prepared_at,
 									hdr->owner, hdr->database);
 			gxact->ondisk = true;


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic