[prev in list] [next in list] [prev in thread] [next in thread] 

List:       mesos-user
Subject:    Re: Mesos Master / Slave communications issues
From:       Devin Carlen <devin.carlen () gmail ! com>
Date:       2015-02-25 7:33:20
Message-ID: etPan.54ed7ac0.6a5f7029.41c () murandy ! local
[Download RAW message or body]

Thanks all, figured it out - the env variable for the hostname passed into \
mesos-master was being set wrong. Thanks for the input!

Devin




On February 24, 2015 at 2:21:49 PM, Ken Sipe (kensipe@gmail.com) wrote:

It appears your configuration is off… as you suspected.. the master registration \
should NOT be 127.0.0.1 or 127.0.1.1.      For each master if you configure the IP in \
a file named ip under `/etc/mesos-master` you should be good (after restarting the \
master)

my configurations under /etc/mesos-master looks like this:
/etc/mesos-master/
├── cluster
├── hostname
├── ip
├── quorum
├── registry
└── work_dir

these are just plan text files.   ip has the internal IP of the master, hostname has \
the fqdn of the master, cluster is the name of the cluster, etc.

good luck!
ken

On Feb 24, 2015, at 4:06 PM, Kenneth Su <su.kench@gmail.com> wrote:

Hi Devin,

I am new to Mesos as well, and I just configured it had the same problem like yours.

For your reference, what my fix was use the actually master IP instead, then slave \
will pick it up and connected. I really wonder if 127.0.0.1, then Slave will use it \
to connect itself and that is why never get to master one.

Hope it helps!

Kenneth

On Tue, Feb 24, 2015 at 2:50 PM, Devin Carlen <devin.carlen@gmail.com> wrote:
Hello all,

I'm new to Mesos but have recently started trying to stand up a cluster using BOSH.   \
There is a BOSH release for it at  \
https://github.com/cf-platform-eng/mesos-boshrelease  that is under active \
development.

I was able to successfully deploy the cluster, however the slaves are not \
communicating with the master.   Upon investigation I found that the leader election \
is happening properly with ZooKeeper.   For this test I only have 1 Mesos master, 3 \
Mesos slaves, and 1 ZooKeeper instance for this test.   All are running on their own \
VMs.   The single master gets elected upon startup:

I0224 21:20:40.716702 12024 contender.cpp:243] New candidate (id='0') has entered the \
contest for leadership I0224 21:20:40.717182 12024 detector.cpp:134] Detected a new \
leader: (id='0') I0224 21:20:40.717718 12030 group.cpp:629] Trying to get \
'/mesos/info_0000000000' in ZooKeeper I0224 21:20:40.722229 12030 detector.cpp:351] A \
new leading master (UPID=master@127.0.0.1:80) is detected I0224 21:20:40.722367 12030 \
master.cpp:734] The newly elected leader is  master@127.0.0.1:80 I0224 \
21:20:40.722394 12030 master.cpp:742] Elected as the leading master!

I thought it odd that the IP listed here is 127.0.0.1.   I have not specified \
localhost anywhere and I explicitly specify —ip=0.0.0.0 in my mesos-master command.

The slave sees the election happen, but then appears to connect to 127.0.0.1:80:

I0224 21:24:18.892083 17316 detector.cpp:134] Detected a new leader: (id='0')
I0224 21:24:18.892290 17316 group.cpp:629] Trying to get '/mesos/info_0000000000' in \
ZooKeeper I0224 21:24:18.894039 17316 detector.cpp:351] A new leading master \
(UPID=master@127.0.0.1:80) is detected I0224 21:24:18.894130 17316 slave.cpp:500] New \
master detected at  master@127.0.0.1:80 I0224 21:24:18.894383 17316 slave.cpp:525] \
Detecting new master I0224 21:24:18.894443 17316 status_update_manager.cpp:162] New \
master detected at  master@127.0.0.1:80 I0224 21:24:18.894630 17320 slave.cpp:1957]  \
master@127.0.0.1:80 exited W0224 21:24:18.894665 17320 slave.cpp:1960] Master \
disconnected! Waiting for a new master to be elected

At this point the slave never successfully connects.   Just to verify, I also checked \
what ZooKeeper was reporting:

$ /zkCli.sh get /mesos/info_0000000000

201502242120-16777343-80-12000��P"master@127.0.0.1:80
cZxid = 0x20
ctime =  Tue Feb 24 21:20:40 UTC  2015
mZxid = 0x20
mtime =  Tue Feb 24 21:20:40 UTC  2015
pZxid = 0x20
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x14bbd711b6e0012
dataLength = 60
numChildren = 0

So somehow the IP 127.0.0.1 is written instead of the correct IP.   Any thoughts on \
how I can fix this?

Best,

Devin


[Attachment #3 (text/html)]

<html><head><style>body{font-family:Helvetica,Arial;font-size:13px}</style></head><body \
style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: \
after-white-space;"><div id="bloop_customfont" \
style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: \
0px; line-height: auto;">Thanks all, figured it out - the env variable for the \
hostname passed into mesos-master was being set wrong. Thanks for the \
input!</div><div id="bloop_customfont" \
style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: \
0px; line-height: auto;"><br></div><div id="bloop_customfont" \
style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: \
0px; line-height: auto;">Devin</div> <br> <div id="bloop_sign_1424849559651994880" \
class="bloop_sign"><br><br><span \
style="font-family:helvetica,arial;font-size:13px"></span><span></span></div> <br><p \
style="color:#000;">On February 24, 2015 at 2:21:49 PM, Ken Sipe (<a \
href="mailto:kensipe@gmail.com">kensipe@gmail.com</a>) wrote:</p> <blockquote \
type="cite" class="clean_bq"><span><div style="word-wrap: break-word; \
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" \
class=""><div></div><div>



<title></title>


It appears your configuration is off… as you suspected.. the master
registration should NOT be 127.0.0.1 or 127.0.1.1. &nbsp; &nbsp;For
each master if you configure the IP in a file named ip under
`/etc/mesos-master` you should be good (after restarting the
master)
<div class=""><br class=""></div>
<div class="">my configurations under /etc/mesos-master looks like
this:</div>
<div class="">
<div style="margin: 0px; font-family: Menlo; color: rgb(83, 48, 225);" class=""><b \
class="">/etc/mesos-master/</b></div> <div style="margin: 0px; font-family: Menlo;" \
class="">├── cluster</div>
<div style="margin: 0px; font-family: Menlo;" class="">├──
hostname</div>
<div style="margin: 0px; font-family: Menlo;" class="">├── ip</div>
<div style="margin: 0px; font-family: Menlo;" class="">├──
quorum</div>
<div style="margin: 0px; font-family: Menlo;" class="">├──
registry</div>
<div style="margin: 0px; font-family: Menlo;" class="">└──
work_dir</div>
</div>
<div style="margin: 0px; font-family: Menlo;" class=""><br class=""></div>
<div style="margin: 0px; font-family: Menlo;" class="">these are
just plan text files. &nbsp;ip has the internal IP of the master,
hostname has the fqdn of the master, cluster is the name of the
cluster, etc.</div>
<div style="margin: 0px; font-family: Menlo;" class=""><br class=""></div>
<div style="margin: 0px; font-family: Menlo;" class="">good
luck!</div>
<div style="margin: 0px; font-family: Menlo;" class="">ken</div>
<div class=""><br class="">
<div>
<blockquote type="cite" class="">
<div class="">On Feb 24, 2015, at 4:06 PM, Kenneth Su &lt;<a \
href="mailto:su.kench@gmail.com" class="">su.kench@gmail.com</a>&gt; wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div dir="ltr" class="">
<div class="gmail_default" style="font-size:small">Hi Devin,</div>
<div class="gmail_default" style="font-size:small"><br class=""></div>
<div class="gmail_default" style="font-size:small">I am new to
Mesos as well, and I just configured it had the same problem like
yours.</div>
<div class="gmail_default" style="font-size:small"><br class=""></div>
<div class="gmail_default" style="font-size:small">For your
reference, what my fix was use the actually master IP instead, then
slave will pick it up and connected. I really wonder if 127.0.0.1,
then Slave will use it to connect itself and that is why never get
to master one.</div>
<div class="gmail_default" style="font-size:small"><br class=""></div>
<div class="gmail_default" style="font-size:small">Hope it
helps!</div>
<div class="gmail_default" style="font-size:small"><br class=""></div>
<div class="gmail_default" style="font-size:small">Kenneth</div>
</div>
<div class="gmail_extra"><br class="">
<div class="gmail_quote">On Tue, Feb 24, 2015 at 2:50 PM, Devin
Carlen <span dir="ltr" class="">&lt;<a href="mailto:devin.carlen@gmail.com" \
target="_blank" class="">devin.carlen@gmail.com</a>&gt;</span> wrote:<br class=""> \
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"> <div style="word-wrap:break-word" class="">
<div style="font-family: Helvetica, Arial; font-size: 13px; margin: 0px;" \
class=""><span style="font-size:13.076923370361328px;line-height:19.5px" \
class="">Hello all,</span></div>
<div class="">
<div style="font-size:13.076923370361328px;line-height:19.5px" class=""><br \
class=""></div> <div style="font-size:13.076923370361328px;line-height:19.5px" \
class="">I'm new to Mesos but have recently started trying to stand up a cluster \
using BOSH.&nbsp; There is a BOSH release for it at&nbsp;<a \
href="https://github.com/cf-platform-eng/mesos-boshrelease" target="_blank" \
class="">https://github.com/cf-platform-eng/mesos-boshrelease</a>&nbsp;that is under \
active development.</div> <div \
style="font-size:13.076923370361328px;line-height:19.5px" class=""><br \
class=""></div> <div style="font-size:13.076923370361328px;line-height:19.5px" \
class="">I was able to successfully deploy the cluster, however the slaves are not \
communicating with the master.&nbsp; Upon investigation I found that the leader \
election is happening properly with ZooKeeper.&nbsp; For this test I only have 1 \
Mesos master, 3 Mesos slaves, and 1 ZooKeeper instance for this
test.&nbsp; All are running on their own VMs.&nbsp; The single
master gets elected upon startup:</div>
<div style="font-size:13.076923370361328px;line-height:19.5px" class=""><br \
class=""></div> <div style="font-size:13.076923370361328px;line-height:19.5px" \
class=""> <div class="">I0224 21:20:40.716702 12024 contender.cpp:243] New
candidate (id='0') has entered the contest for leadership</div>
<div class="">I0224 21:20:40.717182 12024 detector.cpp:134]
Detected a new leader: (id='0')</div>
<div class="">I0224 21:20:40.717718 12030 group.cpp:629] Trying to
get '/mesos/info_0000000000' in ZooKeeper</div>
<div class="">I0224 21:20:40.722229 12030 detector.cpp:351] A new
leading master (<a href="mailto:UPID=master@127.0.0.1" target="_blank" \
class="">UPID=master@127.0.0.1</a>:80) is detected</div> <div class="">I0224 \
21:20:40.722367 12030 master.cpp:734] The newly elected leader is&nbsp;<a \
href="mailto:master@127.0.0.1" target="_blank" class="">master@127.0.0.1</a>:80</div> \
<div class="">I0224 21:20:40.722394 12030 master.cpp:742] Elected as the leading \
master!</div> <div class=""><br class=""></div>
<div class="">I thought it odd that the IP listed here is
127.0.0.1.&nbsp; I have not specified localhost anywhere and I
explicitly specify —ip=0.0.0.0 in my mesos-master command.</div>
<div class=""><br class=""></div>
<div class="">The slave sees the election happen, but then appears
to connect to <a href="http://127.0.0.1/" target="_blank" \
class="">127.0.0.1:80</a>:</div> <div class=""><br class=""></div>
<div class="">
<div class="">I0224 21:24:18.892083 17316 detector.cpp:134]
Detected a new leader: (id='0')</div>
<div class="">I0224 21:24:18.892290 17316 group.cpp:629] Trying to
get '/mesos/info_0000000000' in ZooKeeper</div>
<div class="">I0224 21:24:18.894039 17316 detector.cpp:351] A new
leading master (<a href="mailto:UPID=master@127.0.0.1" target="_blank" \
class="">UPID=master@127.0.0.1</a>:80) is detected</div> <div class="">I0224 \
21:24:18.894130 17316 slave.cpp:500] New master detected at&nbsp;<a \
href="mailto:master@127.0.0.1" target="_blank" class="">master@127.0.0.1</a>:80</div> \
<div class="">I0224 21:24:18.894383 17316 slave.cpp:525] Detecting new master</div>
<div class="">I0224 21:24:18.894443 17316
status_update_manager.cpp:162] New master detected at&nbsp;<a \
href="mailto:master@127.0.0.1" target="_blank" class="">master@127.0.0.1</a>:80</div> \
<div class="">I0224 21:24:18.894630 17320 slave.cpp:1957]&nbsp;<a \
href="mailto:master@127.0.0.1" target="_blank" class="">master@127.0.0.1</a>:80 \
exited</div> <div class="">W0224 21:24:18.894665 17320 slave.cpp:1960] Master
disconnected! Waiting for a new master to be elected</div>
<div class=""><br class=""></div>
<div class="">At this point the slave never successfully
connects.&nbsp; Just to verify, I also checked what ZooKeeper was
reporting:</div>
<div class=""><br class=""></div>
<div class="">$ /zkCli.sh get /mesos/info_0000000000</div>
<div class=""><br class=""></div>
<div class="">
<div class="">201502242120-16777343-80-12000��P"master@<b class=""><a \
href="http://127.0.0.1/" target="_blank" class="">127.0.0.1:80</a></b></div> <div \
class="">cZxid = 0x20</div> <div class="">ctime =&nbsp;<a \
href="http://airmail.calendar/2015-02-24%2013:20:40%20PST" target="_blank" \
class="">Tue Feb 24 21:20:40 UTC</a>&nbsp;2015</div> <div class="">mZxid = 0x20</div>
<div class="">mtime =&nbsp;<a \
href="http://airmail.calendar/2015-02-24%2013:20:40%20PST" target="_blank" \
class="">Tue Feb 24 21:20:40 UTC</a>&nbsp;2015</div> <div class="">pZxid = 0x20</div>
<div class="">cversion = 0</div>
<div class="">dataVersion = 0</div>
<div class="">aclVersion = 0</div>
<div class="">ephemeralOwner = 0x14bbd711b6e0012</div>
<div class="">dataLength = 60</div>
<div class="">numChildren = 0</div>
<div class=""><br class=""></div>
<div class="">So somehow the IP 127.0.0.1 is written instead of the
correct IP.&nbsp; Any thoughts on how I can fix this?</div>
<div class=""><br class=""></div>
<div class="">Best,</div>
<div class=""><br class=""></div>
<div class="">Devin</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br class=""></div>
</div>
</blockquote>
</div>
<br class=""></div>


</div></div></span></blockquote></body></html>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic