[prev in list] [next in list] [prev in thread] [next in thread]
List: pgsql-hackers
Subject: Re: [HACKERS] FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1741
From: Robert Haas <robertmhaas () gmail ! com>
Date: 2012-05-31 14:16:41
Message-ID: CA+TgmobXSwaEe8qVxa+50=Fk4iJMkJUmjSqJ4bHX8bMM-b10dg () mail ! gmail ! com
[Download RAW message or body]
On Thu, May 31, 2012 at 9:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > The one thing that still seems a little odd to me is that this caused
> > a pin count to get orphaned. It seems reasonable that ignoring the
> > AccessExclusiveLock could result in not-found errors trying to open a
> > missing relation, and even fsync requests on a missing relation. But
> > I don't see why that would cause the backend-local pin counts to get
> > messed up, which makes me wonder if there really is another bug here
> > somewhere.
>
> According to Heikki's log, the Assert was in the startup process itself,
> and it happened after an error:
>
> > 2012-05-26 10:44:28.587 CEST 10270 FATAL: could not open file \
> > "base/21268/32994": No such file or directory 2012-05-26 10:44:28.588 CEST 10270 \
> > CONTEXT: writing block 2508 of relation base/21268/32994 xlog redo multi-insert \
> > (init): rel 1663/21268/33006; blk 3117; 58 tuples
> > TRAP: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1741)
> > 2012-05-26 10:44:31.131 CEST 10269 LOG: startup process (PID 10270) was \
> > terminated by signal 6: Aborted
>
> I don't think that code is meant to recover from errors anyway, so
> the fact that it fails with a pin count held isn't exactly surprising.
> But it might be worth looking at exactly which on_proc_exit callbacks
> are installed in the startup process and what assumptions they make.
Which code isn't meant to recover from errors?
> As for where the error came from in the first place, it's easy to
> imagine somebody who's not got the word about the AccessExclusiveLock
> reading pages of the table into buffers that have already been scanned
> by the DROP. So you'd end up with orphaned buffers belonging to a
> vanished table. If somebody managed to dirty them by setting hint bits
> (we do allow that in HS mode no?) then later you'd have various processes
> trying to write the buffer before recycling it, which seems to fit the
> reported error.
Right, I understand the other errors. It's just the pin count that I
am a bit confused about.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic