[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-ha-dev
Subject:    Re: [Linux-ha-dev] pgsql RA improvements
From:       "Serge Dubrouski" <sergeyfd () gmail ! com>
Date:       2007-02-26 18:24:51
Message-ID: 868cbbaa0702261024t6c44ce4l825dd1399d252536 () mail ! gmail ! com
[Download RAW message or body]

There were some more problems besides that initialization. Attached is
a patch. I tested it and it seems to work fine.

On 2/26/07, Andrew Beekhof <beekhof@gmail.com> wrote:
> On 2/26/07, Serge Dubrouski <sergeyfd@gmail.com> wrote:
> > You broke it:
> >
> >  ./pgsql start
> > Usage: grep [OPTION]... PATTERN [FILE]...
> > Try `grep --help' for more information.
> > chown: missing operand after `:'
> > Try `chown --help' for more information.
> > 2007/02/26_12:50:26 ERROR: Can't start PostgreSQL.
> >
> > The reason for these errors is changed way of initialization
>
> sorry - i've pushed up a fix
>
> > variables. Also I still don't like that indefinite loop on start
> > because it makes harder to manually troubleshoot problem in case if
> > PostgreSQL doesn't start.
>
> then add a call to ocf_log which indicates the RA is retrying or some-such
>
> the RA is definitely not the best place to set limits on how long a
> resource can take to start.
>
> at the very least it leads to confusion when the timeout is less than
> an RAs internal limit.  on the other-hand, if the internal limit is
> lower than the timeout, then you're returning before you needed to.
>
> it is also not reliable if any part of the RA can block.
>
> > I don't know what is the right way to fix those problem now: fix your
> > version of script or fix previous one.
> >
> > On 2/26/07, Andrew Beekhof <beekhof@gmail.com> wrote:
> > > i made some further improvements in:
> > >    http://hg.beekhof.net/lha/crm-dev/rev/2e9b22cfb7e1
> > >
> > > On 2/26/07, Keisuke MORI <kskmori@intellilink.co.jp> wrote:
> > > > "Serge Dubrouski" <sergeyfd@gmail.com> writes:
> > > > >> "Serge Dubrouski" <sergeyfd@gmail.com> writes:
> > > > >>
> > > > >> > And I don't like the idea of removing PID in "start" function. The
> > > > >> > standard approach if to remove it after stopping application. Other
> > > > >> > way it could lead to attempt of starting a second copy of application.
> > > > >>
> > > > >> This is necessary for the recovery from the power failure of the
> > > > >> primary node, for example. There is no chance to cleanup by stop
> > > > >> in such cases.
> > > > >>
> > > > >> Duplicate starting is avoided by checking if the postmaster
> > > > >> process exists beforehand, as the original script does.
> > > > >
> > > > > Yes, but in this case you remov the legitimate pid file from the
> > > > > running instance. You remove it before testing that the checking for
> > > > > postmaster.
> > > >
> > > > Well, I think that the script does the cheking for postmaster first
> > > > and removing it second (remove it only when no postmaster process exists).
> > > >
> > > > Here's the code snip with my patch.
> > > > pgsql_status checks for it and I think it should be good enough.
> > > > ----8<--------8<--------8<--------8<--------8<--------8<--------8<--------8<----
> > > > pgsql_start() {
> > > >     if pgsql_status
> > > >     then
> > > >         ocf_log info "PostgreSQL is already running. PID=`cat $PIDFILE`"
> > > >         return $OCF_SUCCESS
> > > >     fi
> > > >
> > > >     if [ -x $PGCTL ]
> > > >     then
> > > >         # Remove postmastre.pid if it exists
> > > >         rm -f $PIDFILE
> > > > ----8<--------8<--------8<--------8<--------8<--------8<--------8<--------8<----
> > > >
> > > >
> > > > > Let me think about it, I don't know what is worse in a
> > > > > such case. Probably you are right and we has the right to think that
> > > > > Postgress shouldn't be started outside of cluster control.
> > > >
> > > > If postmaster was already started outside of heartbeat control,
> > > > then it should return OCF_SUCCESS and the postmaster should
> > > > continue to run.
> > > >
> > > > Power failure is one of the most typical situation that we want
> > > > to save with HA software, so this 'cleanup in start' is
> > > > important, I think.
> > > >
> > > > Maybe it would be nice if we put a WARN log before removing it.
> > > >
> > > > Thanks,
> > > >
> > > > >
> > > > >>
> > > > >>
> > > > >> >
> > > > >> > On 2/23/07, Serge Dubrouski <sergeyfd@gmail.com> wrote:
> > > > >> >> I like the idea of the patch, but honestly I don't like how it's
> > > > >> >> implemented. It shall call (as Andrew suggested) "monitor" function to
> > > > >> >> check that pgsql is up or down instead of spreading the same code all
> > > > >> >> around the script. I'd like to review the idea and prepare another
> > > > >> >> patch if everybody is agree.
> > > > >>
> > > > >> Yes, using the same monitor function would be better.
> > > > >> I didn't do that just because it will dump many logs every
> > > > >> seconds when it takes time to start.
> > > > >> It is OK if you don't mind it.
> > > > >
> > > > > Don't think that this is a problem. Those files are big even without
> > > > > those records.
> > > > >
> > > > > Thanks for all these proposals.
> > > > >
> > > > >>
> > > > >> Thanks,
> > > > >> --
> > > > >> Keisuke MORI
> > > > >> NTT DATA Intellilink Corporation
> > > > >> _______________________________________________________
> > > > >> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> > > > >> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> > > > >> Home Page: http://linux-ha.org/
> > > > >>
> > > > > _______________________________________________________
> > > > > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> > > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> > > > > Home Page: http://linux-ha.org/
> > > >
> > > > --
> > > > Keisuke MORI
> > > > Open Source Business Division
> > > > NTT DATA Intellilink Corporation
> > > > Tel: +81-3-3534-4811 / Fax: +81-3-3534-4814
> > > > _______________________________________________________
> > > > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> > > > Home Page: http://linux-ha.org/
> > > >
> > > _______________________________________________________
> > > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> > > Home Page: http://linux-ha.org/
> > >
> > _______________________________________________________
> > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> > Home Page: http://linux-ha.org/
> >
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
>

["pgsql.in.patch" (application/octet-stream)]

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic