'Re: Upgrading from Postgresql 9.3.8 to 9.6.10'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgresql-admin
Subject:    Re: Upgrading from Postgresql 9.3.8 to 9.6.10
From:       Keith <keith () keithf4 ! com>
Date:       2019-04-25 22:17:05
Message-ID: CAHw75vtMsXoo-n903JZ9E8COe9FDhFQRj8mfPji1hPPFJ7cXsw () mail ! gmail ! com
[Download RAW message or body]

On Thu, Apr 25, 2019 at 4:43 PM Yuri Niyazov <yuri@academia.edu> wrote:

> On Mon, Mar 25, 2019 at 11:44 PM Keith <keith@keithf4.com> wrote:
>
>> On Tue, Mar 26, 2019 at 1:51 AM Yuri Niyazov <yuri@academia.edu> wrote:
>>
>>> We have an old cluster, running 9.3.8, that we are trying to upgrade to
>>> 9.6.10.
>>>
>>> After running pg_upgrade and starting the server, and testing some
>>> common queries against it, we experienced the following error:
>>>
>>> PG::IndexCorrupted: ERROR: index "table_pkey" contains unexpected zero
>>> page at block 17021871 HINT: Please REINDEX it.
>>>
>>> So, if I am reading this correctly, there is an index that 9.3.8
>>> created, and it recognizes that index as OK, but 9.6.10 thinks that index
>>> is corrupt. Since this happened on one index, it seems reasonable to assume
>>> that this could happen on any index in that database, and we need to
>>> reindex the entire database.
>>>
>>> Now, this is a live application, and we would like to minimize
>>> continuous downtime (multiple short downtimes are fine), so what we are
>>> considering doing is a manual reindex: for each index, create a new one,
>>> and then drop the old one, and after all that, upgrade.
>>>
>>> However, we are leery of doing this reindexing using 9.3.8, since it's
>>> already demonstrated itself to be unreliable.
>>>
>>> Which version should we use to reindex? Just the latest 9.3.X? Was there
>>> a known bug with older versions missing corrupted pages in indices?
>>>
>>> PS:  Just in case I am glaringly doing something wrong:
>>>
>>> Here's our mechanism of doing the upgrade: we create a streaming replica
>>> from a basebackup, and then at some point turn off writes to the primary,
>>> convert the replica to another primary, and then run pg_upgrade on this new
>>> primary.
>>>
>>
>>
>> Have you done this more than once? If so, is it the same indexes every
>> time you're having issues with?
>>
>> Answering these questions will help you narrow down if the corruption is
>> in your original database or something to do with the upgrade.
>>
>> However, I would also highly recommend getting to the latest version of
>> 9.3 and rebuilding your replicas afterwards. The latest patch release was
>> 9.3.25, so your current version is quite far behind. There were quite a
>> number of data corruption issues related to replication in the 9.3 series.
>> More-so in the earlier patch versions, but I highly recommend also trying
>> this before your major upgrade.
>>
>> Keith
>>
>
> We tried it a second time, and were unable to reproduce the issue, so it
> seems like the issue is non-deterministic, which is a little scary. In any
> case, we then proceeded to upgrade to 9.3.25 as you recommended, and then
> upgraded to the 9.6 series from it, and the issue didn't reproduce; we've
> been running on the new upgraded version for thel last two weeks and it
> looks ok so far.
>
> Thanks for your help!
>
>
Yuri,

Glad things are working, but I would highly recommend running a reindex on
all tables in your cluster at some point, just as a precaution. It may not
hurt to also run a pg_dump/restore of the database to ensure it doesn't
find any corrupt data during that process as well. Don't mean that you have
to rebuild your current system, just make a dump and restore it to a
secondary system somewhere and make sure it all works ok.

Keith

[Attachment #3 (text/html)]

<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" \
class="gmail_attr">On Thu, Apr 25, 2019 at 4:43 PM Yuri Niyazov &lt;<a \
href="mailto:yuri@academia.edu">yuri@academia.edu</a>&gt; wrote:<br></div><blockquote \
class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">On Mon, Mar 25, 2019 \
at 11:44 PM Keith &lt;<a href="mailto:keith@keithf4.com" \
target="_blank">keith@keithf4.com</a>&gt; wrote:<br></div><div \
class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div \
dir="ltr">On Tue, Mar 26, 2019 at 1:51 AM Yuri Niyazov &lt;<a \
href="mailto:yuri@academia.edu" target="_blank">yuri@academia.edu</a>&gt; \
wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" \
style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div>We have an old \
cluster, running 9.3.8, that we are trying to upgrade to \
9.6.10.<br></div><div><br></div><div>After running pg_upgrade and starting the \
server, and testing some common queries against it, we experienced the following \
error:<br></div><div><br></div><div>PG::IndexCorrupted: ERROR: index \
&quot;table_pkey&quot; contains unexpected zero page at block 17021871 HINT: Please \
REINDEX it.<br></div><div><br></div><div>So, if I am reading this  correctly, there \
is an index that 9.3.8 created, and it recognizes that index as OK, but 9.6.10 thinks \
that index is corrupt. Since this happened on one index, it seems reasonable to \
assume that this could happen on any index in that database, and we need to reindex \
the entire database.  </div><div><br></div><div>Now, this is a live application, and \
we would like to minimize continuous downtime (multiple short downtimes are fine), so \
what we are considering doing is a manual reindex: for each index, create a new one, \
and then drop the old one, and after all that, \
upgrade.<br></div><div><br></div><div>However, we are leery of doing this reindexing \
using 9.3.8, since it&#39;s already demonstrated itself to be unreliable.  \
</div><div><br></div><div>Which version should we use to reindex? Just the latest \
9.3.X? Was there a known bug with older versions missing corrupted pages in indices?  \
</div><div><br></div><div>PS:   Just in case I am glaringly doing something \
wrong:</div><div><br></div><div>Here&#39;s our mechanism of doing the upgrade: we \
create a streaming replica from a basebackup, and then at some point turn off writes \
to the primary, convert the replica to another primary, and then run pg_upgrade on \
this new primary.  </div></div></div></blockquote><div><br></div><div><br></div><div>Have \
you done this more than once? If so, is it the same indexes every time you&#39;re \
having issues with?</div><div><br></div><div>Answering these questions will help you \
narrow down if the corruption is in your original database or something to do with \
the upgrade.</div><div><br></div><div>However, I would also highly recommend getting \
to the latest version of 9.3 and rebuilding your replicas afterwards. The latest \
patch release was 9.3.25, so your current version is quite far behind. There were \
quite a number of data corruption issues related to replication in the 9.3 series. \
More-so in the earlier patch versions, but I highly recommend also trying this before \
your major upgrade.</div><div><br></div><div>Keith<br></div></div></div></blockquote><div><br></div><div>We \
tried it a second time, and were unable to reproduce the issue, so it seems like the \
issue is non-deterministic, which is a little scary. In any case, we then proceeded \
to upgrade to 9.3.25 as you recommended, and then upgraded to the 9.6 series from it, \
and the issue didn&#39;t reproduce; we&#39;ve been running on the new upgraded \
version for thel last two weeks and it looks ok so far.  \
</div><div><br></div><div>Thanks for your \
help!<br><br></div></div></div></blockquote><div><br></div><div>Yuri,</div><div><br></div><div>Glad \
things are working, but I would highly recommend running a reindex on all tables in \
your cluster at some point, just as a precaution. It may not hurt to also run a \
pg_dump/restore of the database to ensure it doesn&#39;t find any corrupt data during \
that process as well. Don&#39;t mean that you have to rebuild your current system, \
just make a dump and restore it to a secondary system somewhere and make sure it all \
works ok.</div><div><br></div><div>Keith<br></div></div></div>



[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic