'Re: Do i really need HDFS?'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       mesos-user
Subject:    Re: Do i really need HDFS?
From:       Dick Davies <dick () hellooperator ! net>
Date:       2014-10-22 18:04:18
Message-ID: CAK5eLPRQgZek-rPvEJaheu5nmP2u8X74P_wtrWX3a0vU=aKgww () mail ! gmail ! com
[Download RAW message or body]

I haven't got as far as deploying a FS yet - still weighing up the options.

Our Mesos cluster is just a PaaS at the moment but I think the option
to use capacity
for adhoc distributed computing alongside the web workloads is a killer feature.

We're soon to Dockerize as well so some option that can be reached
from containers
is pretty important too.

Ceph is a strong candidate because of the S3 compatibility, since I
know that will be
usable from within Docker without any trouble when we need non-DB persistence.
That and it's resilience seems a good match to Mesos' own. I need some
real world
war story type research before I can really say it's a good alternative though.

As I'm a Spark newbie I don't want to run before I can walk, so I'll
probably start with
a HDFS deployment on the test systems to get the feel of it first.


On 22 October 2014 17:40, CCAAT <ccaat@tampabay.rr.com> wrote:
> Ok so,
>
> I'd be curious to know your final architecture (D. Davies)?
>
> I was looking to put Ceph on top of the (3) btrfs nodes in case we need a
> DFS at some later point. We're not really sure what softwares will be
> in our final mix. Certainly installing Ceph does not hurt anything (?);
> and I'm not sure we want to use ceph from userspace only. We have had
> excellent success using btrfs, so that is firm for us, short of some
> gapping problem emerging. Growing the cluster size will happen, once
> we establish the basic functionality of the cluster.
>
> Right now, there is a focus on subsurface fluid simulations for carbon
> sequsttration, but also using the cluster for general (cron-chronos) batch
> jobs is a secondary appeal to us. So, I guess my question is, knowing that
> we want to avoid the hdfs/hadoop setup entirely, will localFS/DFS with
> btrfs/ceph be sufficiently  robust  to test not only mesos+spark but many
> other related softwares, such as but not limited to R, scala, sparkR,
> database(sql) and many other softwares? We're just trying to avoid some
> common mistakes as we move forward with mesos.
>
> James
>
>
>
>
> On 10/22/14 02:29, Dick Davies wrote:
>>
>> Be interested to know what that is, if you don't mind sharing.
>>
>> We're thinking of deploying a Ceph cluster for another project anyway,
>> it seems to remove some of the chokepoints/points of failure HDFS suffers
>> from
>> but I've no idea how well it can interoperate with the usual HDFS clients
>> (Spark in my particular case but I'm trying to keep this general).
>>
>> On 21 October 2014 13:16, David Greenberg <dsg123456789@gmail.com> wrote:
>>>
>>> We use spark without HDFS--in our case, we just use ansible to copy the
>>> spark executors onto all hosts at the same path. We also load and store
>>> our
>>> spark data from non-HDFS sources.
>>>
>>> On Tue, Oct 21, 2014 at 4:57 AM, Dick Davies <dick@hellooperator.net>
>>> wrote:
>>>>
>>>>
>>>> I think Spark needs a way to send jobs to/from the workers - the Spark
>>>> distro itself
>>>> will pull down the executor ok, but in my (very basic) tests I got
>>>> stuck without HDFS.
>>>>
>>>> So basically it depends on the framework. I think in Sparks case they
>>>> assume most
>>>> users are migrating from an existing Hadoop deployment, so HDFS is
>>>> sort of assumed.
>>>>
>>>>
>>>> On 20 October 2014 23:18, CCAAT <ccaat@tampabay.rr.com> wrote:
>>>>>
>>>>> On 10/20/14 11:46, Steven Schlansker wrote:
>>>>>
>>>>>
>>>>>> We are running Mesos entirely without HDFS with no problems.  We use
>>>>>> Docker to distribute our
>>>>>> application to slave nodes, and keep no state on individual nodes.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Background: I'm building up a 3 node cluster to run mesos and spark. No
>>>>> legacy Hadoop needed or wanted. I am using btrfs for the local file
>>>>> system,
>>>>> with (2) drives set up for raid1 on each system.
>>>>>
>>>>> So you  are suggesting that I can install mesos + spark + docker
>>>>> and not a DFS on these (3) machines?
>>>>>
>>>>>
>>>>> Will I need any other softwares? My application is a geophysical
>>>>> fluid simulator, so scala, R, and all sorts of advanced math will
>>>>> be required on the cluster for the Finite Element Methods.
>>>>>
>>>>>
>>>>> James
>>>>>
>>>>>
>>>
>>>
>>
>

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic