[prev in list] [next in list] [prev in thread] [next in thread] 

List:       pgsql-hackers
Subject:    Re: [HACKERS] pg_dump / copy bugs with "big lines" ?
From:       Alvaro Herrera <alvherre () 2ndquadrant ! com>
Date:       2016-02-29 18:30:23
Message-ID: 20160229183023.GA286012 () alvherre ! pgsql
[Download RAW message or body]

A customer of ours was hit by this recently and I'd like to get it fixed
for good.

Robert Haas wrote:
> On Mon, Apr 6, 2015 at 1:51 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> > In any case, I don't think it would be terribly difficult to allow a bit
> > more than 1GB in a StringInfo. Might need to tweak palloc too; ISTR there's
> > some 1GB limits there too.
> 
> The point is, those limits are there on purpose.  Changing things
> arbitrarily wouldn't be hard, but doing it in a principled way is
> likely to require some thought.  For example, in the COPY OUT case,
> presumably what's happening is that we palloc a chunk for each
> individual datum, and then palloc a buffer for the whole row.

Right.  There's a serious problem here already, and users are being
bitten by it in existing releases.  I think we should provide a
non-invasive fix for back-branches which we could apply as a bug fix.
Here's a proposed patch for the localized fix; it just adds a flag to
StringInfo allowing the string to grow beyond the 1GB limit, but only
for strings which are specifically marked.  That way we avoid having to
audit all the StringInfo-using code.  In this patch, only the COPY path
is allowed to grow beyond 1GB, which is enough to close the current
problem -- or at least my test case for it.

If others can try this patch to ensure it enables pg_dump to work on
their databases, it would be great.

(In this patch there's a known buglet which is that the UINT64_FORMAT
patched in the error message doesn't allow for translatability.)

Like Robert, I don't like this approach for the long term, however.  I
think it would be saner to have CopySendData not keep a single gigantic
string in memory; it would be better to get the bytes out to the client
sooner than end-of-row.  To avoid causing a performance hit, we would
only flush when the size of the output buffer is about to reach the
allocation limit (MaxAllocSize); so for regular-sized tuples, we would
only do it at end-of-row, keeping the current behavior.  I don't have a
patch for this yet; I'm going to try that next.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

["huge-stringinfo.patch" (text/x-diff)]

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3eba9ef..fcc4fe6 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -445,6 +445,15 @@ SendCopyEnd(CopyState cstate)
 	}
 }
 
+/*
+ * Prepare for output
+ */
+static void
+CopyStartSend(CopyState cstate)
+{
+	allowLongStringInfo(cstate->fe_msgbuf);
+}
+
 /*----------
  * CopySendData sends output data to the destination (file or frontend)
  * CopySendString does the same for null-terminated strings
@@ -1865,6 +1874,8 @@ CopyOneRowTo(CopyState cstate, Oid tupleOid, Datum *values, bool *nulls)
 	MemoryContextReset(cstate->rowcontext);
 	oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
 
+	CopyStartSend(cstate);
+
 	if (cstate->binary)
 	{
 		/* Binary per-tuple header */
diff --git a/src/backend/lib/stringinfo.c b/src/backend/lib/stringinfo.c
index 7d03090..8c08eb7 100644
--- a/src/backend/lib/stringinfo.c
+++ b/src/backend/lib/stringinfo.c
@@ -47,12 +47,24 @@ initStringInfo(StringInfo str)
 {
 	int			size = 1024;	/* initial default buffer size */
 
-	str->data = (char *) palloc(size);
+	str->data = (char *) palloc(size);	/* no need for "huge" at this point */
 	str->maxlen = size;
+	str->allowlong = false;
 	resetStringInfo(str);
 }
 
 /*
+ * allocLongStringInfo
+ *
+ * Mark the StringInfo as a candidate for a "huge" allocation
+ */
+void
+allowLongStringInfo(StringInfo str)
+{
+	str->allowlong = true;
+}
+
+/*
  * resetStringInfo
  *
  * Reset the StringInfo: the data buffer remains valid, but its
@@ -64,6 +76,7 @@ resetStringInfo(StringInfo str)
 	str->data[0] = '\0';
 	str->len = 0;
 	str->cursor = 0;
+	str->allowlong = false;
 }
 
 /*
@@ -244,7 +257,8 @@ appendBinaryStringInfo(StringInfo str, const char *data, int datalen)
 void
 enlargeStringInfo(StringInfo str, int needed)
 {
-	int			newlen;
+	int64		newlen;
+	Size		limit;
 
 	/*
 	 * Guard against out-of-range "needed" values.  Without this, we can get
@@ -252,16 +266,19 @@ enlargeStringInfo(StringInfo str, int needed)
 	 */
 	if (needed < 0)				/* should not happen */
 		elog(ERROR, "invalid string enlargement request size: %d", needed);
-	if (((Size) needed) >= (MaxAllocSize - (Size) str->len))
+
+	/* choose the proper limit and verify this allocation wouldn't exceed it */
+	limit = str->allowlong ? MaxAllocHugeSize : MaxAllocSize;
+	if (((Size) needed) >= (limit - (Size) str->len))
 		ereport(ERROR,
 				(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
 				 errmsg("out of memory"),
-				 errdetail("Cannot enlarge string buffer containing %d bytes by %d more bytes.",
+				 errdetail("Cannot enlarge string buffer containing "INT64_FORMAT" bytes by %d more bytes.",
 						   str->len, needed)));
 
 	needed += str->len + 1;		/* total space required now */
 
-	/* Because of the above test, we now have needed <= MaxAllocSize */
+	/* Because of the above test, we now have needed <= limit */
 
 	if (needed <= str->maxlen)
 		return;					/* got enough space already */
@@ -276,14 +293,14 @@ enlargeStringInfo(StringInfo str, int needed)
 		newlen = 2 * newlen;
 
 	/*
-	 * Clamp to MaxAllocSize in case we went past it.  Note we are assuming
-	 * here that MaxAllocSize <= INT_MAX/2, else the above loop could
+	 * Clamp to the limit in case we went past it.  Note we are assuming
+	 * here that MaxAllocHugeSize <= INT64_MAX/2, else the above loop could
 	 * overflow.  We will still have newlen >= needed.
 	 */
-	if (newlen > (int) MaxAllocSize)
-		newlen = (int) MaxAllocSize;
+	if (newlen > (int64) limit)
+		newlen = (int64) limit;
 
-	str->data = (char *) repalloc(str->data, newlen);
+	str->data = (char *) repalloc_huge(str->data, newlen);
 
 	str->maxlen = newlen;
 }
diff --git a/src/include/lib/stringinfo.h b/src/include/lib/stringinfo.h
index 4fff10a..cac6741 100644
--- a/src/include/lib/stringinfo.h
+++ b/src/include/lib/stringinfo.h
@@ -30,14 +30,17 @@
  *		cursor	is initialized to zero by makeStringInfo or initStringInfo,
  *				but is not otherwise touched by the stringinfo.c routines.
  *				Some routines use it to scan through a StringInfo.
+ *		allowlong boolean flag indicating whether this StringInfo can allocate
+ *				more than MaxAllocSize bytes.
  *-------------------------
  */
 typedef struct StringInfoData
 {
 	char	   *data;
-	int			len;
-	int			maxlen;
+	int64		len;
+	int64		maxlen;
 	int			cursor;
+	bool		allowlong;
 } StringInfoData;
 
 typedef StringInfoData *StringInfo;
@@ -79,6 +82,11 @@ extern StringInfo makeStringInfo(void);
 extern void initStringInfo(StringInfo str);
 
 /*------------------------
+ * Set flag to allow "huge" stringinfos.
+ */
+extern void allowLongStringInfo(StringInfo str);
+
+/*------------------------
  * resetStringInfo
  * Clears the current content of the StringInfo, if any. The
  * StringInfo remains valid.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic