[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgresql-general
Subject:    standby fails with out-of-order XID insertion
From:       Radoslav Nedyalkov <rnedyalkov () gmail ! com>
Date:       2021-03-31 15:51:48
Message-ID: CANhtRiZM-g7BU7XmCLG5+Md8xk6E7D8chqc+T=zrUydO4cD4bg () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


Hi all,
So we have master(pg11.8) and standbyA(pg11.11), 24T data, quite busy
data-warehouse on Amazon Linux.
We've built a standbyB(pg11.11) by creating basebackup  from standbyA and
collecting wals from the master.
When  basebackup completed we started recovery from collected wals.
When standbyB caught-up we tried to switch to streaming.
1. created slot on the master
2. activated slot (with short pg_receivewal invocation)
3. did a checkpoint on the master (not sure why)
4. cancelled pg_receivewal.
5. stopped the standbyB and started it. Startup fails with
FATAL:  out-of-order XID insertion in KnownAssignedXids
More log attached.

after switching back to log-shipping same error occurs.

Any clues why/what is the error ?

Thanks and regards,
Radoslav

[Attachment #5 (text/html)]

<div dir="ltr">Hi all,<div>So we have master(pg11.8) and standbyA(pg11.11), 24T data, \
quite busy data-warehouse on Amazon Linux.</div><div>We&#39;ve built a \
standbyB(pg11.11) by creating basebackup   from standbyA and collecting wals from the \
master.</div><div>When   basebackup completed we started recovery from collected \
wals.</div><div>When standbyB caught-up we tried to switch to streaming.</div><div>1. \
created slot  on the master</div><div>2. activated slot (with short pg_receivewal \
invocation)</div><div>3. did a checkpoint on the master (not sure why)</div><div>4. \
cancelled pg_receivewal.</div><div>5. stopped the standbyB and  started it. Startup \
fails with  </div><div>FATAL:   out-of-order XID insertion in \
KnownAssignedXids<br></div><div>More log attached.</div><div><br></div><div>after \
switching back to log-shipping same error occurs.</div><div><br></div><div>Any clues \
why/what is the error ?</div><div><br></div><div>Thanks and \
regards,</div><div>Radoslav</div></div>

--00000000000060203c05bed71868--


["fatal.txt" (text/plain)]

2021-03-31 14:22:36 UTC [41623] 0: [8-1] user=,db= host=,app=[] LOG:  received fast \
shutdown request 2021-03-31 14:22:36 UTC [41623] 0: [9-1] user=,db= host=,app=[] LOG: \
aborting any active transactions 2021-03-31 14:22:37 UTC [41642] 0: [1483-1] \
user=,db= host=,app=[] LOG:  restartpoint complete: wrote 753696 buffers (2.9%); 0 \
WAL file(s) added, 0 removed, 345 recycled; write=122.672 s, sync=0.063 s, \
total=122.871 s; sync files=2588, longest=0.001 s, average=0.001 s; distance=6276805 \
kB, estimate=7385309 kB 2021-03-31 14:22:37 UTC [41642] 0: [1484-1] user=,db= \
host=,app=[] LOG:  recovery restart point at A8EA1/298E6BB8 2021-03-31 14:22:37 UTC \
[41642] 0: [1485-1] user=,db= host=,app=[] DETAIL:  Last completed transaction was at \
log time 2021-03-31 14:22:35.091945+00. 2021-03-31 14:22:38 UTC [41642] 0: [1486-1] \
user=,db= host=,app=[] LOG:  shutting down 2021-03-31 14:22:38 UTC [41642] 0: \
[1487-1] user=,db= host=,app=[] LOG:  restartpoint starting: shutdown immediate \
2021-03-31 14:22:41 UTC [41642] 0: [1488-1] user=,db= host=,app=[] LOG:  restartpoint \
complete: wrote 304044 buffers (1.2%); 0 WAL file(s) added, 0 removed, 0 recycled; \
write=3.318 s, sync=0.027 s, total=3.348 s; sync files=2154, longest=0.001 s, \
average=0.001 s; distance=4314192 kB, estimate=7078198 kB 2021-03-31 14:22:41 UTC \
[41642] 0: [1489-1] user=,db= host=,app=[] LOG:  recovery restart point at \
A8EA2/30DFAEA8 2021-03-31 14:22:41 UTC [41642] 0: [1490-1] user=,db= host=,app=[] \
DETAIL:  Last completed transaction was at log time 2021-03-31 14:22:35.091945+00. \
2021-03-31 14:22:41 UTC [41623] 0: [10-1] user=,db= host=,app=[] LOG:  database \
system is shut down 2021-03-31 14:23:03 UTC [84999] 0: [1-1] user=,db= host=,app=[] \
LOG:  database system was shut down in recovery at 2021-03-31 14:22:41 UTC 2021-03-31 \
14:23:03 UTC [84999] 0: [2-1] user=,db= host=,app=[] LOG:  entering standby mode \
2021-03-31 14:23:03 UTC [84999] 0: [3-1] user=,db= host=,app=[] LOG:  recovered \
replication state of node 1 to 526/409B7D28 2021-03-31 14:23:03 UTC [84999] 0: [4-1] \
user=,db= host=,app=[] LOG:  recovered replication state of node 2 to 8B75/A97FC4E0 \
2021-03-31 14:23:03 UTC [84999] 0: [5-1] user=,db= host=,app=[] LOG:  recovered \
replication state of node 19 to 580/B02F81A8 2021-03-31 14:23:03 UTC [84999] 0: [6-1] \
user=,db= host=,app=[] LOG:  recovered replication state of node 6 to 580/B03B13C0 \
2021-03-31 14:23:03 UTC [84999] 0: [7-1] user=,db= host=,app=[] LOG:  recovered \
replication state of node 5 to 228D/B578FBA8 2021-03-31 14:23:03 UTC [84999] 0: [8-1] \
user=,db= host=,app=[] LOG:  recovered replication state of node 3 to 5F4/6B0D7560 \
2021-03-31 14:23:03 UTC [84999] 0: [9-1] user=,db= host=,app=[] LOG:  recovered \
replication state of node 7 to 65F/9971C210 2021-03-31 14:23:03 UTC [84999] 0: [10-1] \
user=,db= host=,app=[] LOG:  recovered replication state of node 9 to 65F/B19E5C38 \
2021-03-31 14:23:03 UTC [84999] 0: [11-1] user=,db= host=,app=[] LOG:  recovered \
replication state of node 10 to 3D8/2E753F28 2021-03-31 14:23:03 UTC [84999] 0: \
[12-1] user=,db= host=,app=[] LOG:  recovered replication state of node 12 to \
526/40791D18 2021-03-31 14:23:03 UTC [84999] 0: [13-1] user=,db= host=,app=[] LOG:  \
recovered replication state of node 13 to 521/E87DE3B8 2021-03-31 14:23:03 UTC \
[84999] 0: [14-1] user=,db= host=,app=[] LOG:  recovered replication state of node 4 \
to 526/409B7EE8 2021-03-31 14:23:03 UTC [84999] 0: [15-1] user=,db= host=,app=[] LOG: \
recovered replication state of node 15 to 65F/B1817750 2021-03-31 14:23:03 UTC \
[84999] 0: [16-1] user=,db= host=,app=[] LOG:  recovered replication state of node 14 \
to 200/20004088 2021-03-31 14:23:03 UTC [84999] 0: [17-1] user=,db= host=,app=[] LOG: \
recovered replication state of node 17 to 63F/5CF62898 2021-03-31 14:23:03 UTC \
[84999] 0: [18-1] user=,db= host=,app=[] LOG:  recovered replication state of node 18 \
to 96D/9D63BD30 2021-03-31 14:23:03 UTC [84999] 0: [19-1] user=,db= host=,app=[] LOG: \
recovered replication state of node 16 to DA/43CB4728 2021-03-31 14:23:03 UTC [84999] \
0: [20-1] user=,db= host=,app=[] LOG:  recovered replication state of node 20 to \
580/B03AFB58 2021-03-31 14:23:03 UTC [84999] 0: [21-1] user=,db= host=,app=[] LOG:  \
recovered replication state of node 11 to 580/B02A63C8 2021-03-31 14:23:03 UTC \
[84999] 0: [22-1] user=,db= host=,app=[] LOG:  recovered replication state of node 21 \
to 37C/BC042508 2021-03-31 14:23:03 UTC [84999] 0: [23-1] user=,db= host=,app=[] LOG: \
recovered replication state of node 22 to 319/ECA0DCF0 2021-03-31 14:23:03 UTC \
[84999] 0: [24-1] user=,db= host=,app=[] LOG:  redo starts at A8EA2/30DFAEA8 \
2021-03-31 14:23:04 UTC [84999] 0: [25-1] user=,db= host=,app=[] LOG:  11 \
KnownAssignedXids (num=11 tail=0 head=11) [0]=27311547 [1]=41706387 [2]=41821378 \
[3]=44370470 [4]=44843675 [5]=51340860 [6]=53056024 [7]=53575629 [8]=53575647 \
[9]=53575669 [10]=55152872  2021-03-31 14:23:04 UTC [84999] 0: [26-1] user=,db= \
host=,app=[] CONTEXT:  WAL redo at A8EA2/3A900150 for Standby/RUNNING_XACTS: nextXid \
55152875 latestCompletedXid 55152874 oldestRunningXid 4260572256; 13 xacts: 53056024 \
53575669 41821378 53575647 41706387 53575629 4260572256 51340860 54917157 44843675 \
44370470 27311547 55152872 2021-03-31 14:23:04 UTC [84999] 0: [27-1] user=,db= \
host=,app=[] FATAL:  out-of-order XID insertion in KnownAssignedXids 2021-03-31 \
14:23:04 UTC [84999] 0: [28-1] user=,db= host=,app=[] CONTEXT:  WAL redo at \
A8EA2/3A900150 for Standby/RUNNING_XACTS: nextXid 55152875 latestCompletedXid \
55152874 oldestRunningXid 4260572256; 13 xacts: 53056024 53575669 41821378 53575647 \
41706387 53575629 4260572256 51340860 54917157 44843675 44370470 27311547 55152872 \
2021-03-31 14:23:04 UTC [84985] 0: [7-1] user=,db= host=,app=[] LOG:  startup process \
(PID 84999) exited with exit code 1 2021-03-31 14:23:04 UTC [84985] 0: [8-1] \
user=,db= host=,app=[] LOG:  terminating any other active server processes



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic