[prev in list] [next in list] [prev in thread] [next in thread] 

List:       pgsql-bugs
Subject:    Re: [BUGS] BUG #11264: Auto vacuum wraparound job blocking everything
From:       David Gould <daveg () sonic ! net>
Date:       2014-09-30 0:44:11
Message-ID: loom.20140930T022539-927 () post ! gmane ! org
[Download RAW message or body]

Alvaro Herrera <alvherre <at> 2ndquadrant.com> writes:

> Well, yes, 9.3.4 had a bug fixed by this commit:
> 
> Author: Bruce Momjian <bruce <at> momjian.us>
> Branch: master [a61daa14d] 2014-07-02 15:29:38 -0400
> Branch: REL9_4_STABLE [b446a384b] 2014-07-02 15:29:38 -0400
> Branch: REL9_3_STABLE Release: REL9_3_5 [3d2e18510] 2014-07-02 15:29:38 
-0400
> 
>     pg_upgrade:  preserve database and relation minmxid values
> 
>     Also set these values for pre-9.3 old clusters that don't have values 
to
>     preserve.
> 
>     Analysis by Alvaro
> 
>     Backpatch through 9.3
> 
> > How do we fix the current issue with this one server? Is there an easy 
fix?
> > Thanks.
> 
> As far as I am aware, you should
>   UPDATE pg_database SET datminmxid=20783
> 
> and that should fix it.  The oldestMulti value in pg_control would get
> updated by itself some time later.  If you experience stalls before
> oldestMulti fixes itself, you could stop the server (cleanly!) and then
> pg_resetxlog -m x,y where x is the correct nextMulti value from
> pg_controldata and y is 20783.

We ran into this on two instances last night and I'm starting to worry that 
many others may hit it as well. As far as I can tell any postgres instance 
that was upgraded to 9.3 by a pre-9.3.5 pg_upgrade is at risk as they all 
will have:

  Latest checkpoint's oldestMultiXid:   1

The workaround above is good if you still have the old cluster around from 
before the upgrade. We did not, that was all cleaned up months ago. Which 
raises the question: how do you fix a 9.3 instance that has oldestMultiXid = 
1 without access to the pre-upgrade instance. That is, where do you get the 
correct value of oldestMultiXid to set into pg_database.datxminmxid?

I took a guess that the oldest pg_class.relminmxid for all the tables in all 
the databases would be ok and updated pg_database.datminmxid witt that. That 
is, in each database I ran:

  select relname, relminmxid, age(relminmxid)
    from pg_class
    where relkind = 'r'
      and age(relminmxid) < 2147483647
    order by 3 desc limit 1 ;

And then used the oldest one to update pg_database.datminmxid. After a 
checkpoint and shutdown/restart I see that it was written to pg_controldata
too. Afterwards I was able to run a vacuum freeze on the problem table.

Questions:

1) Is the procedure above safe and effective for this, or did I just hose my 
databases?

2) If the procedure above is not safe, what do we do instead?

3) Is this likely to affect all 9.3.x<5 upgraded databases? If so, how do we 
fix it before too much damage happens in the wider world?

-dg

--
David Gould                               daveg <at> sonic.net
If simplicity worked, the world would be overrun with insects.




-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic