'Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       pgsql-bugs
Subject:    Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows
From:       Thomas Munro <thomas.munro () gmail ! com>
Date:       2023-10-23 21:54:46
Message-ID: CA+hUKGK4omtUqvyP9rE2u=LtDUAH8WvzAp+zq9d2JG99ZajxwA () mail ! gmail ! com
[Download RAW message or body]

On Tue, Oct 24, 2023 at 4:06 AM Robert Haas <robertmhaas@gmail.com> wrote:
> But within mdtruncate(), we've got more than one problem, I think.
> mdnblocks() is a problem because of the reason that you mention, but
> register_dirty_segment() doesn't look totally safe either, because it
> can call RegisterSyncRequest() which, in a standalone backend, can
> call RememberSyncRequest().

But that is already enkludgified thusly:

        /*
         * XXX: The checkpointer needs to add entries to the pending ops table
         * when absorbing fsync requests.  That is done within a critical
         * section, which isn't usually allowed, but we make an exception. It
         * means that there's a theoretical possibility that you run out of
         * memory while absorbing fsync requests, which leads to a PANIC.
         * Fortunately the hash table is small so that's unlikely to happen in
         * practice.
         */
        pendingOpsCxt = AllocSetContextCreate(TopMemoryContext,
                                              "Pending ops context",
                                              ALLOCSET_DEFAULT_SIZES);
        MemoryContextAllowInCriticalSection(pendingOpsCxt, true);


> In general, it seems like it would be a lot nicer if we were doing a
> lot less stuff inside the critical section here. So I think you're
> right that we need some refactoring. Maybe smgr_prepare_truncate() and
> smgr_execute_truncate() or something like that. I wonder if we could
> actually register the dirty segment in the "prepare" phase - is it bad
> if we register a dirty segment before actually dirtying it? And maybe
> even CacheInvalidateSmgr() could be done at that stage? It seems
> pretty clear that dropping the dirty buffers and actually truncating
> the relation on disk need to happen after we've entered the critical
> section, because if we fail after doing the former, we've thrown away
> dirty data in anticipation of performing an operation that didn't
> happen, and if we fail when attempting the latter, primaries and
> standbys diverge and the originally-reported bug on this thread
> happens. But we'd like to move as much other stuff as we can out of
> that critical section.

Hmm, yeah it seems like that direction would be a nice improvement, as
long as we are sure that the fsync request can't be processed too
soon.


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic