[prev in list] [next in list] [prev in thread] [next in thread]
List: pgsql-hackers
Subject: Re: [HACKERS] pg_dump / copy bugs with "big lines" ?
From: Alvaro Herrera <alvherre () 2ndquadrant ! com>
Date: 2016-02-29 18:30:23
Message-ID: 20160229183023.GA286012 () alvherre ! pgsql
[Download RAW message or body]
A customer of ours was hit by this recently and I'd like to get it fixed
for good.
Robert Haas wrote:
> On Mon, Apr 6, 2015 at 1:51 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> > In any case, I don't think it would be terribly difficult to allow a bit
> > more than 1GB in a StringInfo. Might need to tweak palloc too; ISTR there's
> > some 1GB limits there too.
>
> The point is, those limits are there on purpose. Changing things
> arbitrarily wouldn't be hard, but doing it in a principled way is
> likely to require some thought. For example, in the COPY OUT case,
> presumably what's happening is that we palloc a chunk for each
> individual datum, and then palloc a buffer for the whole row.
Right. There's a serious problem here already, and users are being
bitten by it in existing releases. I think we should provide a
non-invasive fix for back-branches which we could apply as a bug fix.
Here's a proposed patch for the localized fix; it just adds a flag to
StringInfo allowing the string to grow beyond the 1GB limit, but only
for strings which are specifically marked. That way we avoid having to
audit all the StringInfo-using code. In this patch, only the COPY path
is allowed to grow beyond 1GB, which is enough to close the current
problem -- or at least my test case for it.
If others can try this patch to ensure it enables pg_dump to work on
their databases, it would be great.
(In this patch there's a known buglet which is that the UINT64_FORMAT
patched in the error message doesn't allow for translatability.)
Like Robert, I don't like this approach for the long term, however. I
think it would be saner to have CopySendData not keep a single gigantic
string in memory; it would be better to get the bytes out to the client
sooner than end-of-row. To avoid causing a performance hit, we would
only flush when the size of the output buffer is about to reach the
allocation limit (MaxAllocSize); so for regular-sized tuples, we would
only do it at end-of-row, keeping the current behavior. I don't have a
patch for this yet; I'm going to try that next.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
["huge-stringinfo.patch" (text/x-diff)]
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3eba9ef..fcc4fe6 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -445,6 +445,15 @@ SendCopyEnd(CopyState cstate)
}
}
+/*
+ * Prepare for output
+ */
+static void
+CopyStartSend(CopyState cstate)
+{
+ allowLongStringInfo(cstate->fe_msgbuf);
+}
+
/*----------
* CopySendData sends output data to the destination (file or frontend)
* CopySendString does the same for null-terminated strings
@@ -1865,6 +1874,8 @@ CopyOneRowTo(CopyState cstate, Oid tupleOid, Datum *values, bool *nulls)
MemoryContextReset(cstate->rowcontext);
oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
+ CopyStartSend(cstate);
+
if (cstate->binary)
{
/* Binary per-tuple header */
diff --git a/src/backend/lib/stringinfo.c b/src/backend/lib/stringinfo.c
index 7d03090..8c08eb7 100644
--- a/src/backend/lib/stringinfo.c
+++ b/src/backend/lib/stringinfo.c
@@ -47,12 +47,24 @@ initStringInfo(StringInfo str)
{
int size = 1024; /* initial default buffer size */
- str->data = (char *) palloc(size);
+ str->data = (char *) palloc(size); /* no need for "huge" at this point */
str->maxlen = size;
+ str->allowlong = false;
resetStringInfo(str);
}
/*
+ * allocLongStringInfo
+ *
+ * Mark the StringInfo as a candidate for a "huge" allocation
+ */
+void
+allowLongStringInfo(StringInfo str)
+{
+ str->allowlong = true;
+}
+
+/*
* resetStringInfo
*
* Reset the StringInfo: the data buffer remains valid, but its
@@ -64,6 +76,7 @@ resetStringInfo(StringInfo str)
str->data[0] = '\0';
str->len = 0;
str->cursor = 0;
+ str->allowlong = false;
}
/*
@@ -244,7 +257,8 @@ appendBinaryStringInfo(StringInfo str, const char *data, int datalen)
void
enlargeStringInfo(StringInfo str, int needed)
{
- int newlen;
+ int64 newlen;
+ Size limit;
/*
* Guard against out-of-range "needed" values. Without this, we can get
@@ -252,16 +266,19 @@ enlargeStringInfo(StringInfo str, int needed)
*/
if (needed < 0) /* should not happen */
elog(ERROR, "invalid string enlargement request size: %d", needed);
- if (((Size) needed) >= (MaxAllocSize - (Size) str->len))
+
+ /* choose the proper limit and verify this allocation wouldn't exceed it */
+ limit = str->allowlong ? MaxAllocHugeSize : MaxAllocSize;
+ if (((Size) needed) >= (limit - (Size) str->len))
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("out of memory"),
- errdetail("Cannot enlarge string buffer containing %d bytes by %d more bytes.",
+ errdetail("Cannot enlarge string buffer containing "INT64_FORMAT" bytes by %d more bytes.",
str->len, needed)));
needed += str->len + 1; /* total space required now */
- /* Because of the above test, we now have needed <= MaxAllocSize */
+ /* Because of the above test, we now have needed <= limit */
if (needed <= str->maxlen)
return; /* got enough space already */
@@ -276,14 +293,14 @@ enlargeStringInfo(StringInfo str, int needed)
newlen = 2 * newlen;
/*
- * Clamp to MaxAllocSize in case we went past it. Note we are assuming
- * here that MaxAllocSize <= INT_MAX/2, else the above loop could
+ * Clamp to the limit in case we went past it. Note we are assuming
+ * here that MaxAllocHugeSize <= INT64_MAX/2, else the above loop could
* overflow. We will still have newlen >= needed.
*/
- if (newlen > (int) MaxAllocSize)
- newlen = (int) MaxAllocSize;
+ if (newlen > (int64) limit)
+ newlen = (int64) limit;
- str->data = (char *) repalloc(str->data, newlen);
+ str->data = (char *) repalloc_huge(str->data, newlen);
str->maxlen = newlen;
}
diff --git a/src/include/lib/stringinfo.h b/src/include/lib/stringinfo.h
index 4fff10a..cac6741 100644
--- a/src/include/lib/stringinfo.h
+++ b/src/include/lib/stringinfo.h
@@ -30,14 +30,17 @@
* cursor is initialized to zero by makeStringInfo or initStringInfo,
* but is not otherwise touched by the stringinfo.c routines.
* Some routines use it to scan through a StringInfo.
+ * allowlong boolean flag indicating whether this StringInfo can allocate
+ * more than MaxAllocSize bytes.
*-------------------------
*/
typedef struct StringInfoData
{
char *data;
- int len;
- int maxlen;
+ int64 len;
+ int64 maxlen;
int cursor;
+ bool allowlong;
} StringInfoData;
typedef StringInfoData *StringInfo;
@@ -79,6 +82,11 @@ extern StringInfo makeStringInfo(void);
extern void initStringInfo(StringInfo str);
/*------------------------
+ * Set flag to allow "huge" stringinfos.
+ */
+extern void allowLongStringInfo(StringInfo str);
+
+/*------------------------
* resetStringInfo
* Clears the current content of the StringInfo, if any. The
* StringInfo remains valid.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic