[prev in list] [next in list] [prev in thread] [next in thread] 

List:       vdsm-devel
Subject:    Re: [ovirt-devel] OST: HE vm does not restart on HC setup
From:       Francesco Romani <fromani () redhat ! com>
Date:       2017-02-22 15:34:49
Message-ID: ae64d4e1-5853-bed3-c516-6335ace350b1 () redhat ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


On 02/22/2017 03:42 PM, Yaniv Kaul wrote:
> 
> 
> On Wed, Feb 22, 2017 at 4:32 PM Francesco Romani <fromani@redhat.com
> <mailto:fromani@redhat.com>> wrote:
> 
> On 02/22/2017 01:53 PM, Simone Tiraboschi wrote:
> > 
> > 
> > On Wed, Feb 22, 2017 at 1:33 PM, Simone Tiraboschi
> > <stirabos@redhat.com <mailto:stirabos@redhat.com>> wrote:
> > 
> > When ovirt-ha-agent checks the status of the engine VM we get:
> > 
> > 2017-02-21 22:21:14,738-0500 ERROR (jsonrpc/2) [api] FINISH getStats \
> > error=Virtual machine does not exist: {'vmId': \
> > u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} (api:69) Traceback (most recent call \
> > last): File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 67, in \
> > method ret = func(*args, **kwargs)
> > File "/usr/share/vdsm/API.py", line 335, in getStats
> > vm = self.vm
> > File "/usr/share/vdsm/API.py", line 130, in vm
> > raise exception.NoSuchVM(vmId=self._UUID)
> > NoSuchVM: Virtual machine does not exist: {'vmId': \
> > u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} 
> > While in ovirt-ha-agent logs we have:
> > 
> > MainThread::INFO::2017-02-21 \
> > 22:21:18,583::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) \
> > Current state UnknownLocalVmState (score: 3400) 
> > ...
> > 
> > MainThread::INFO::2017-02-21 \
> > 22:21:31,199::state_decorators::25::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) \
> > Unknown local engine vm status no actions taken 
> > Probably it's a bug or a regression somewhere on master.
> > 
> > On ovirt-ha-broker side the detection is based on a strict string
> > match on the error message that is expected to be exactly
> > 'Virtual machine does not exist' to set down status otherwise we
> > set unknown status as in this case:
> > https://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-ha.git;a=blob;f=ovirt_hosted \
> > _engine_ha/broker/submonitors/engine_health.py;h=d633cb860b811e84021221771bf706a9a4ac1d63;hb=refs/heads/master#l54
> >  
> > 
> > Adding Francesco here to understand if something has recently
> > changed there on vdsm side.
> It has changed indeed; we had a series of changes which added
> context to some exceptions. I believe the straw who broke the
> camel's back was I32ec3f86f8d53f8412f4c0526fc85e2a42e30ea5 It is
> unfortunate that this change broke HA. Could you perhaps fixing it
> checking that the message *begins* with that string, and/or
> checking the error code. bests,
> 
> 
> On the bright side, this is exactly why we need o-s-t running
> Hosted-Engine - though we probably need to exercise more HE flows
> (global and local maint., for example).
> On the downside, how come I32ec3f86f8d53f8412f4c0526fc85e2a42e30ea5
> was merged on Jan1st, and we only saw the regression now? Is there
> another bug that hid this one until now?
> Y.
> 

It was merged on Jan 29 on master, backported on Feb 8 on 4.1 branch
(because it was part of the vmleases feature, needed on 4.1.z).

Bests,

-- 
Francesco Romani
Red Hat Engineering Virtualization R & D
IRC: fromani


[Attachment #5 (text/html)]

<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    On 02/22/2017 03:42 PM, Yaniv Kaul wrote:<br>
    <blockquote
cite="mid:CAJgorsaL15MmmHX-c-7Qr3pb-rBjKp+sBtExD8mH2zh6hiMWWQ@mail.gmail.com"
      type="cite">
      <div dir="ltr"><br>
        <br>
        <div class="gmail_quote">
          <div dir="ltr">On Wed, Feb 22, 2017 at 4:32 PM Francesco
            Romani &lt;<a moz-do-not-send="true"
              href="mailto:fromani@redhat.com">fromani@redhat.com</a>&gt;
            wrote:<br>
          </div>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div bgcolor="#FFFFFF" text="#000000" class="gmail_msg"> On
              02/22/2017 01:53 PM, Simone Tiraboschi wrote:<br
                class="gmail_msg">
              <blockquote type="cite" class="gmail_msg">
                <div dir="ltr" class="gmail_msg"><br class="gmail_msg">
                  <div class="gmail_extra gmail_msg"><br
                      class="gmail_msg">
                    <div class="gmail_quote gmail_msg">On Wed, Feb 22,
                      2017 at 1:33 PM, Simone Tiraboschi <span
                        dir="ltr" class="gmail_msg">&lt;<a
                          moz-do-not-send="true"
                          href="mailto:stirabos@redhat.com"
                          class="gmail_msg" \
target="_blank">stirabos@redhat.com</a>&gt;</span>  wrote:<br class="gmail_msg">
                      <blockquote class="gmail_quote gmail_msg"
                        style="margin:0px 0px 0px 0.8ex;border-left:1px
                        solid rgb(204,204,204);padding-left:1ex">
                        <div dir="ltr" class="gmail_msg">When
                          ovirt-ha-agent checks the status of the engine
                          VM we get:
                          <div class="gmail_msg">
                            <pre style="color:rgb(0,0,0)" \
class="gmail_msg">2017-02-21 22:21:14,738-0500 ERROR (jsonrpc/2) [api] FINISH \
getStats error=Virtual machine does not exist: {'vmId': \
u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} (api:69) Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 67, in method
    ret = func(*args, **kwargs)
  File "/usr/share/vdsm/API.py", line 335, in getStats
    vm = self.vm
  File "/usr/share/vdsm/API.py", line 130, in vm
    raise exception.NoSuchVM(vmId=self._UUID)
NoSuchVM: Virtual machine does not exist: {'vmId': \
u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'}</pre>  <pre class="gmail_msg">While in \
ovirt-ha-agent logs we have:<pre style="color:rgb(0,0,0)" class="gmail_msg"><pre \
class="gmail_msg">MainThread::<a moz-do-not-send="true" \
class="m_-8855782447270567219moz-txt-link-freetext gmail_msg">INFO::2017-02-21</a> \
22:21:18,583::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) \
Current state UnknownLocalVmState (score: 3400)</pre><pre \
class="gmail_msg">...</pre></pre><pre style="color:rgb(0,0,0)" \
class="gmail_msg">MainThread::<a moz-do-not-send="true" \
class="m_-8855782447270567219moz-txt-link-freetext gmail_msg">INFO::2017-02-21</a> \
22:21:31,199::state_decorators::25::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) \
Unknown local engine vm status no actions taken</pre></pre>  Probably it's a bug or a \
regression  somewhere on master.</div>
                        </div>
                      </blockquote>
                      <div class="gmail_msg">
                      </div>
                      <div class="gmail_msg">On ovirt-ha-broker side the
                        detection is based on a strict string match on
                        the error message that is expected to be exactly
                        'Virtual machine does not exist' to set down
                        status otherwise we set unknown status as in
                        this case:</div>
                      <div class="gmail_msg"><a moz-do-not-send="true"
href="https://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-ha.git;a=blob;f=ovirt_host \
ed_engine_ha/broker/submonitors/engine_health.py;h=d633cb860b811e84021221771bf706a9a4ac1d63;hb=refs/heads/master#l54"
  class="gmail_msg" target="_blank">https://gerrit.ovirt.org/gitweb?p=ovirt-hosted-eng \
ine-ha.git;a=blob;f=ovirt_hosted_engine_ha/broker/submonitors/engine_health.py;h=d633cb860b811e84021221771bf706a9a4ac1d63;hb=refs/heads/master#l54</a>
  </div>
                      <div class="gmail_msg"> </div>
                      <div class="gmail_msg">Adding Francesco here to
                        understand if something has recently changed
                        there on vdsm side.</div>
                    </div>
                  </div>
                </div>
              </blockquote>
            </div>
            <div bgcolor="#FFFFFF" text="#000000" class="gmail_msg">
              It has changed indeed; we had a series of changes which
              added context to some exceptions. I believe the straw who
              broke the camel's back was
              I32ec3f86f8d53f8412f4c0526fc85e2a42e30ea5
              It is unfortunate that this change broke HA.
              Could you perhaps fixing it checking that the message
              *begins* with that string, and/or checking the error code.
              bests,</div>
          </blockquote>
          <div><br>
          </div>
          <div>On the bright side, this is exactly why we need o-s-t
            running Hosted-Engine - though we probably need to exercise
            more HE flows (global and local maint., for example).</div>
          <div>On the downside, how come <span
              style="font-size:13px;color:rgb(33,33,33)">I32ec3f86f8d53f8412f4c0526fc85</span><span
  style="font-size:13px;color:rgb(33,33,33)">e2a42e30ea5 was
              merged on Jan1st, and we only saw the regression now? Is
              there another bug that hid this one until now?</span></div>
          <div><font color="#212121">Y.</font></div>
          <br>
        </div>
      </div>
    </blockquote>
    <br>
    It was merged on Jan 29 on master, backported on Feb 8 on 4.1 branch
    (because it was part of the vmleases feature, needed on 4.1.z).<br>
    <br>
    Bests,<br>
    <br>
    <pre class="moz-signature" cols="72">-- 
Francesco Romani
Red Hat Engineering Virtualization R &amp; D
IRC: fromani</pre>
  </body>
</html>



_______________________________________________
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic