[prev in list] [next in list] [prev in thread] [next in thread] 

List:       mesos-dev
Subject:    Re: One more error question
From:       Matei Zaharia <matei () eecs ! berkeley ! edu>
Date:       2012-01-28 8:14:31
Message-ID: D7CBFDD6-E3AD-4D1D-9C07-8F0FF6EBBA8E () eecs ! berkeley ! edu
[Download RAW message or body]

It's very likely to be because the AMI has an older version of Mesos. We should make \
a new AMI.

The -d git option in the script seems to be broken too, so we should fix that. In \
theory it would work… I think it broke when we switched the location of the repo (and \
maybe the internal structure too).

Matei

On Jan 27, 2012, at 9:36 PM, Matthew Rathbone wrote:

> When I spin up mesos using the ec2 scripts, and redeploy both hdfs and hadoop using \
> cloudera's distribution I see this error when I try to start the jobtracker:  
> 12/01/28 05:23:28 INFO util.HostsFileReader: Setting the includes file to 
> 12/01/28 05:23:28 INFO util.HostsFileReader: Setting the excludes file to 
> 12/01/28 05:23:28 INFO util.HostsFileReader: Refreshing hosts (include/exclude) \
> list 12/01/28 05:23:28 INFO mapred.JobTracker: Decommissioning 0 nodes
> 12/01/28 05:23:28 INFO mapred.FrameworkScheduler: Got resource offer value: \
> "201201280508-0-5" 
> Exception in thread "Thread-20" java.lang.NoSuchMethodError: \
> org.apache.mesos.Protos$Resource.getScalar()Lorg/apache/mesos/Protos$Value$Scalar; \
> at org.apache.hadoop.mapred.FrameworkScheduler.getResource(FrameworkScheduler.java:176)
>  at org.apache.hadoop.mapred.FrameworkScheduler.getResource(FrameworkScheduler.java:183)
>  at org.apache.hadoop.mapred.FrameworkScheduler.resourceOffers(FrameworkScheduler.java:203)
>  
> 
> It seems to be stopping the job tracker from starting new tasks.
> 
> I was wondering if this is a version conflict between the mesos I've built against \
> (trunk), and the version of mesos used on the AMI? -- it seems to come from the \
> generated protobuf library. 
> 
> 
> To try and solve this, I attempted to spin up a cluster passing -d git (to have the \
> latest code pulled from git, but then I get a string of crazy python exceptions: 
> sync error: unexplained error (code 255) at \
> /SourceCache/rsync/rsync-40/rsync/io.c(452) [sender=2.6.9] Traceback (most recent \
> call last): File "./mesos_ec2.py", line 541, in <module>
> main()
> File "./mesos_ec2.py", line 450, in main
> setup_cluster(conn, master_nodes, slave_nodes, zoo_nodes, opts, True)
> File "./mesos_ec2.py", line 304, in setup_cluster
> deploy_files(conn, "deploy." + opts.os, opts, master_nodes, slave_nodes, zoo_nodes)
> File "./mesos_ec2.py", line 415, in deploy_files
> subprocess.check_call(command, shell=True)
> File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/subprocess.py", \
> line 462, in check_call raise CalledProcessError(retcode, cmd)
> subprocess.CalledProcessError: Command 'rsync -rv -e 'ssh -o \
> StrictHostKeyChecking=no -i /Users/matthew/id-foursquare' \
> '/var/folders/CK/CKzwG+5sFuSjDMUTvdmWfk+++TI/-Tmp-/tmpFmfdmB/' \
> 'root@ec2.amazonaws.com:/'' returned non-zero exit status 255 
> 
> 
> 
> Are version conflicts the likely reason for this failure do you think?
> 
> -- 
> Matthew Rathbone
> Foursquare | Software Engineer | Server Engineering Team
> matthew@foursquare.com (mailto:matthew@foursquare.com) | @rathboma \
> (http://twitter.com/rathboma) | 4sq (http://foursquare.com/rathboma) 
> 


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic