[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgresql-hackers
Subject:    Re: [DOC] Document concurrent index builds waiting on each other
From:       James Coleman <jtc331 () gmail ! com>
Date:       2020-07-31 18:51:09
Message-ID: CAAaqYe9oEfbz7AxXq7OX+FFVi5w5p1e_Of8ON8ZnKO9QqBfmjg () mail ! gmail ! com
[Download RAW message or body]

On Thu, Jul 16, 2020 at 7:34 PM David Johnston
<david.g.johnston@gmail.com> wrote:
>
> The following review has been posted through the commitfest application:
> make installcheck-world:  not tested
> Implements feature:       not tested
> Spec compliant:           not tested
> Documentation:            tested, passed
>
> James,
>
> I'm on board with the point of pointing out explicitly the "concurrent in=
dex builds on multiple tables at the same time will not return on any one t=
able until all have completed", with back-patching.  I do not believe the n=
ew paragraph is necessary though.  I'd suggest trying to weave it into the =
existing paragraph ending "Even then, however, the index may not be immedia=
tely usable for queries: in the worst case, it cannot be used as long as tr=
ansactions exist that predate the start of the index build."  Adding "Notab=
ly, " in front of the existing sentence fragment above and tacking it onto =
the end probably suffices.

I'm not sure "the index may not be immediately usable for queries" is
really accurate/sufficient: it seems to imply the CREATE INDEX has
returned but for some reason the index isn't yet valid. The issue I'm
trying to describe here is that the CREATE INDEX query itself will not
return until all preceding queries have completed *including*
concurrent index creations on unrelated tables.

> I don't actually don't whether this is true behavior though.  Is it somet=
hing our tests do, or could, demonstrate?

It'd take tests that exercise parallelism, but it's pretty simple to
demonstrate (but you do have to catch the first index build in a scan
phase, so you either need lots of data or a hack). Here's an example
that uses a bit of a hack to simulate a slow scan phase:

Setup:
create table items(i int);
create table others(i int);
create function slow_expr() returns text as $$ select pg_sleep(15);
select '5'; $$ language sql immutable;
insert into items(i) values (1), (2);
insert into others(i) values (1), (2);

Then the following in order:
1. In session A: create index concurrently on items((i::text || slow_expr()=
));
2. In session B (at the same time): create index concurrently on others(i);

You'll notice that the 2nd command, which should be practically
instantaneous, waits on the first ~30s scan phase of (1) before it
returns. The same is true if after (2) completes you immediately run
it again -- it waits on the second ~30s scan phase of (1).

That does reveal a bit of complexity though that that the current
patch doesn't address, which is that this can be phase dependent (and
that complexity gets a lot more non-obvious when there's real live
activity (particularly long-running transactions) in the system as
well.

I've attached a new patch series with two items:
1. A simpler (and I believe more correct) doc changes for "cic blocks
cic on other tables".
2. A patch to document that all index builds can prevent tuples from
being vacuumed away on other tables.

If it's preferable we could commit the first and discuss the second
separately, but since that limitation was also discussed up-thread, I
decided to include them both here for now.

James

["v2-0001-Document-concurrent-indexes-waiting-on-each-other.patch" (application/octet-stream)]

From 79d1289d4cca25939d8887a133063b94dbc521fa Mon Sep 17 00:00:00 2001
From: James Coleman <jtc331@gmail.com>
Date: Fri, 31 Jul 2020 12:28:54 -0400
Subject: [PATCH v2 1/2] Document concurrent indexes waiting on each other

Because regular CREATE INDEX commands are independent, and there's no
logical data dependency, it's not immediately obvious that transactions
held by concurrent index builds on one table will block the second
phase of concurrent index creation on an unrelated table, so document
this caveat.
---
 doc/src/sgml/ref/create_index.sgml | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 33aa64e81d..a37342ee77 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -600,7 +600,8 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
     wait for existing transactions that have modified the table to terminate.
     After the second scan, the index build must wait for any transactions
     that have a snapshot (see <xref linkend="mvcc"/>) predating the second
-    scan to terminate.  Then finally the index can be marked ready for use,
+    scan to terminate, including transactions used by any phase of concurrent
+    index builds on other tables.  Then finally the index can be marked ready for use,
     and the <command>CREATE INDEX</command> command terminates.
     Even then, however, the index may not be immediately usable for queries:
     in the worst case, it cannot be used as long as transactions exist that
-- 
2.20.1 (Apple Git-117)


["v2-0002-Document-vacuum-on-one-table-depending-on-concurr.patch" (application/octet-stream)]

From 6b9406559f6ffc17f1c238c8e5061ad880ac2c06 Mon Sep 17 00:00:00 2001
From: James Coleman <jtc331@gmail.com>
Date: Fri, 31 Jul 2020 12:42:42 -0400
Subject: [PATCH v2 2/2] Document vacuum on one table depending on concurrent
 index creation

Because commands like VACUUM on one table are not considered when
determining what xids are safe to vacuum on another table, it's
operationally surprising that CREATE INDEX CONCURRENTLY on one table
is considered when determining what xids are safe to vacuum on another
table.
---
 doc/src/sgml/ref/create_index.sgml | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index a37342ee77..0c7b9f6139 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -808,6 +808,12 @@ Indexes:
    performed in parallel.
   </para>
 
+  <para>
+   <command>CREATE INDEX</command> (including the <literal>CONCURRENTLY</literal>
+   option) commands are included when <command>VACUUM</command> calculates what
+   dead tuples are safe to remove even on tables other than the one being indexed.
+  </para>
+
   <para>
    Use <xref linkend="sql-dropindex"/>
    to remove an index.
-- 
2.20.1 (Apple Git-117)



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic