[prev in list] [next in list] [prev in thread] [next in thread] 

List:       pgsql-hackers
Subject:    Re: [HACKERS] FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1741
From:       Robert Haas <robertmhaas () gmail ! com>
Date:       2012-05-31 14:16:41
Message-ID: CA+TgmobXSwaEe8qVxa+50=Fk4iJMkJUmjSqJ4bHX8bMM-b10dg () mail ! gmail ! com
[Download RAW message or body]

On Thu, May 31, 2012 at 9:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > The one thing that still seems a little odd to me is that this caused
> > a pin count to get orphaned.  It seems reasonable that ignoring the
> > AccessExclusiveLock could result in not-found errors trying to open a
> > missing relation, and even fsync requests on a missing relation.  But
> > I don't see why that would cause the backend-local pin counts to get
> > messed up, which makes me wonder if there really is another bug here
> > somewhere.
> 
> According to Heikki's log, the Assert was in the startup process itself,
> and it happened after an error:
> 
> > 2012-05-26 10:44:28.587 CEST 10270 FATAL:  could not open file \
> > "base/21268/32994": No such file or directory 2012-05-26 10:44:28.588 CEST 10270 \
> > CONTEXT:  writing block 2508 of relation base/21268/32994 xlog redo multi-insert \
> >                 (init): rel 1663/21268/33006; blk 3117; 58 tuples
> > TRAP: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1741)
> > 2012-05-26 10:44:31.131 CEST 10269 LOG:  startup process (PID 10270) was \
> > terminated by signal 6: Aborted
> 
> I don't think that code is meant to recover from errors anyway, so
> the fact that it fails with a pin count held isn't exactly surprising.
> But it might be worth looking at exactly which on_proc_exit callbacks
> are installed in the startup process and what assumptions they make.

Which code isn't meant to recover from errors?

> As for where the error came from in the first place, it's easy to
> imagine somebody who's not got the word about the AccessExclusiveLock
> reading pages of the table into buffers that have already been scanned
> by the DROP.  So you'd end up with orphaned buffers belonging to a
> vanished table.  If somebody managed to dirty them by setting hint bits
> (we do allow that in HS mode no?) then later you'd have various processes
> trying to write the buffer before recycling it, which seems to fit the
> reported error.

Right, I understand the other errors.  It's just the pin count that I
am a bit confused about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic