[prev in list] [next in list] [prev in thread] [next in thread]
List: postgresql-admin
Subject: Re: Upgrading from Postgresql 9.3.8 to 9.6.10
From: Keith <keith () keithf4 ! com>
Date: 2019-04-25 22:17:05
Message-ID: CAHw75vtMsXoo-n903JZ9E8COe9FDhFQRj8mfPji1hPPFJ7cXsw () mail ! gmail ! com
[Download RAW message or body]
On Thu, Apr 25, 2019 at 4:43 PM Yuri Niyazov <yuri@academia.edu> wrote:
> On Mon, Mar 25, 2019 at 11:44 PM Keith <keith@keithf4.com> wrote:
>
>> On Tue, Mar 26, 2019 at 1:51 AM Yuri Niyazov <yuri@academia.edu> wrote:
>>
>>> We have an old cluster, running 9.3.8, that we are trying to upgrade to
>>> 9.6.10.
>>>
>>> After running pg_upgrade and starting the server, and testing some
>>> common queries against it, we experienced the following error:
>>>
>>> PG::IndexCorrupted: ERROR: index "table_pkey" contains unexpected zero
>>> page at block 17021871 HINT: Please REINDEX it.
>>>
>>> So, if I am reading this correctly, there is an index that 9.3.8
>>> created, and it recognizes that index as OK, but 9.6.10 thinks that index
>>> is corrupt. Since this happened on one index, it seems reasonable to assume
>>> that this could happen on any index in that database, and we need to
>>> reindex the entire database.
>>>
>>> Now, this is a live application, and we would like to minimize
>>> continuous downtime (multiple short downtimes are fine), so what we are
>>> considering doing is a manual reindex: for each index, create a new one,
>>> and then drop the old one, and after all that, upgrade.
>>>
>>> However, we are leery of doing this reindexing using 9.3.8, since it's
>>> already demonstrated itself to be unreliable.
>>>
>>> Which version should we use to reindex? Just the latest 9.3.X? Was there
>>> a known bug with older versions missing corrupted pages in indices?
>>>
>>> PS: Just in case I am glaringly doing something wrong:
>>>
>>> Here's our mechanism of doing the upgrade: we create a streaming replica
>>> from a basebackup, and then at some point turn off writes to the primary,
>>> convert the replica to another primary, and then run pg_upgrade on this new
>>> primary.
>>>
>>
>>
>> Have you done this more than once? If so, is it the same indexes every
>> time you're having issues with?
>>
>> Answering these questions will help you narrow down if the corruption is
>> in your original database or something to do with the upgrade.
>>
>> However, I would also highly recommend getting to the latest version of
>> 9.3 and rebuilding your replicas afterwards. The latest patch release was
>> 9.3.25, so your current version is quite far behind. There were quite a
>> number of data corruption issues related to replication in the 9.3 series.
>> More-so in the earlier patch versions, but I highly recommend also trying
>> this before your major upgrade.
>>
>> Keith
>>
>
> We tried it a second time, and were unable to reproduce the issue, so it
> seems like the issue is non-deterministic, which is a little scary. In any
> case, we then proceeded to upgrade to 9.3.25 as you recommended, and then
> upgraded to the 9.6 series from it, and the issue didn't reproduce; we've
> been running on the new upgraded version for thel last two weeks and it
> looks ok so far.
>
> Thanks for your help!
>
>
Yuri,
Glad things are working, but I would highly recommend running a reindex on
all tables in your cluster at some point, just as a precaution. It may not
hurt to also run a pg_dump/restore of the database to ensure it doesn't
find any corrupt data during that process as well. Don't mean that you have
to rebuild your current system, just make a dump and restore it to a
secondary system somewhere and make sure it all works ok.
Keith
[Attachment #3 (text/html)]
<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" \
class="gmail_attr">On Thu, Apr 25, 2019 at 4:43 PM Yuri Niyazov <<a \
href="mailto:yuri@academia.edu">yuri@academia.edu</a>> wrote:<br></div><blockquote \
class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">On Mon, Mar 25, 2019 \
at 11:44 PM Keith <<a href="mailto:keith@keithf4.com" \
target="_blank">keith@keithf4.com</a>> wrote:<br></div><div \
class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div \
dir="ltr">On Tue, Mar 26, 2019 at 1:51 AM Yuri Niyazov <<a \
href="mailto:yuri@academia.edu" target="_blank">yuri@academia.edu</a>> \
wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" \
style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div>We have an old \
cluster, running 9.3.8, that we are trying to upgrade to \
9.6.10.<br></div><div><br></div><div>After running pg_upgrade and starting the \
server, and testing some common queries against it, we experienced the following \
error:<br></div><div><br></div><div>PG::IndexCorrupted: ERROR: index \
"table_pkey" contains unexpected zero page at block 17021871 HINT: Please \
REINDEX it.<br></div><div><br></div><div>So, if I am reading this correctly, there \
is an index that 9.3.8 created, and it recognizes that index as OK, but 9.6.10 thinks \
that index is corrupt. Since this happened on one index, it seems reasonable to \
assume that this could happen on any index in that database, and we need to reindex \
the entire database. </div><div><br></div><div>Now, this is a live application, and \
we would like to minimize continuous downtime (multiple short downtimes are fine), so \
what we are considering doing is a manual reindex: for each index, create a new one, \
and then drop the old one, and after all that, \
upgrade.<br></div><div><br></div><div>However, we are leery of doing this reindexing \
using 9.3.8, since it's already demonstrated itself to be unreliable. \
</div><div><br></div><div>Which version should we use to reindex? Just the latest \
9.3.X? Was there a known bug with older versions missing corrupted pages in indices? \
</div><div><br></div><div>PS: Just in case I am glaringly doing something \
wrong:</div><div><br></div><div>Here's our mechanism of doing the upgrade: we \
create a streaming replica from a basebackup, and then at some point turn off writes \
to the primary, convert the replica to another primary, and then run pg_upgrade on \
this new primary. </div></div></div></blockquote><div><br></div><div><br></div><div>Have \
you done this more than once? If so, is it the same indexes every time you're \
having issues with?</div><div><br></div><div>Answering these questions will help you \
narrow down if the corruption is in your original database or something to do with \
the upgrade.</div><div><br></div><div>However, I would also highly recommend getting \
to the latest version of 9.3 and rebuilding your replicas afterwards. The latest \
patch release was 9.3.25, so your current version is quite far behind. There were \
quite a number of data corruption issues related to replication in the 9.3 series. \
More-so in the earlier patch versions, but I highly recommend also trying this before \
your major upgrade.</div><div><br></div><div>Keith<br></div></div></div></blockquote><div><br></div><div>We \
tried it a second time, and were unable to reproduce the issue, so it seems like the \
issue is non-deterministic, which is a little scary. In any case, we then proceeded \
to upgrade to 9.3.25 as you recommended, and then upgraded to the 9.6 series from it, \
and the issue didn't reproduce; we've been running on the new upgraded \
version for thel last two weeks and it looks ok so far. \
</div><div><br></div><div>Thanks for your \
help!<br><br></div></div></div></blockquote><div><br></div><div>Yuri,</div><div><br></div><div>Glad \
things are working, but I would highly recommend running a reindex on all tables in \
your cluster at some point, just as a precaution. It may not hurt to also run a \
pg_dump/restore of the database to ensure it doesn't find any corrupt data during \
that process as well. Don't mean that you have to rebuild your current system, \
just make a dump and restore it to a secondary system somewhere and make sure it all \
works ok.</div><div><br></div><div>Keith<br></div></div></div>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic