[prev in list] [next in list] [prev in thread] [next in thread] 

List:       pgsql-bugs
Subject:    Re: BUG #18168: Parallel worker failed to initialize: could not create inherited socket: error code 
From:       Thomas Munro <thomas.munro () gmail ! com>
Date:       2023-10-25 21:23:03
Message-ID: CA+hUKGJQrzNrXn1us_sYC9Djh9p7AQ1uPHWAjWhLzhX5YV-35w () mail ! gmail ! com
[Download RAW message or body]

On Thu, Oct 26, 2023 at 3:44 AM Boyer, Maxime (he/him | il/lui)
<Maxime.Boyer@cra-arc.gc.ca> wrote:
> > FWIW, the PG code that throws that error message is old enough to vote;
> > it's not something we changed in a recent minor release.
> 
> Yeah, that's what I thought :'D
> 
> > I am guessing you saw the impact of some external event, but I don't know what.
> 
> Fair enough. This happened the day after reverting to 11, because of the memory \
> error on 14, but I also doubt it's related. I was stopping one of the application \
> node at the time. Maybe a Windows thing, or something related to the firmware \
> updates.

Re-bonjour Maxime,

FWIW that comes from WSASocket() trying to inherit/duplicate a socket
used for communication with the pgstat process (a process and a socket
that don't exist in PostgreSQL 15, where that mechanism was replaced
with a new shared memory system; but given you were trying to upgrade
to 14 you probably don't want to hear about 15 today...).

I have no idea why that would happen, but for the record the manual[1] says:

"WSAEPROVIDERFAILEDINIT
10106
Service provider failed to initialize. The requested service provider
could not be loaded or initialized. This error is returned if either a
service provider's DLL could not be loaded (LoadLibrary failed) or the
provider's WSPStartup or NSPStartup function failed."

That seems pretty low level.  If this were PostgreSQL's fault I
suppose it would have to come from corruption of the WSAPROTOCOL_INFO
struct (a sort of cookie we need to duplicate the socket), but I doubt
it.  I see there were a few reports years ago about this error message
from pre-parallel-query times.  It's interesting that you see this
specifically with parallel workers (which inherits only a pgstat
socket, not with the client connection socket.  The pgstat socket is
different in that it is a UDP socket.  I wonder if there is something
special about UDP that is upsetting your network stack, perhaps a
firewall thing somewhere that is upset specifically by some limit on
UDP activity or something.  But I'm not a Windows guy so I have no
real clue.

[1] https://learn.microsoft.com/en-us/windows/win32/winsock/windows-sockets-error-codes-2



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic