[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-user
Subject:    Re: difference between development and production platform???
From:       "Hamedani, Masoud" <masoud () agape ! hanyang ! ac ! kr>
Date:       2011-09-29 4:02:05
Message-ID: CAMHV67Rpkwq=ZcASSbVL-jgsjF5=t5n1mx4q-mLJ7bFp_e3jaA () mail ! gmail ! com
[Download RAW message or body]


Dear Steve,

thanks for your useful comments, I completely agree with your idea,
personally its more than 10 years that im only using Fedora, java, java
related techs, and open source software in all of my projects,
but this is a critical situation, all of current data and apps in our univ's
lab deployed on Microsoft platform.
we can transfer our data from windows to Linux, but all of the codes are
written in C#, we can connect C# code to hadoop and run them on Linux too
but personally i cant grantee the result.
*SO AS A SUMMARY*:
1- we can only use Linux machines for production platform,
2- and only using windows as *development platform* in pseudo-distributed
mode.

AM I RIGHT in 1 and 2? please correct or verify them.

Thanks,
BS.
Masoud,

2011/9/28 Steve Loughran <stevel@apache.org>

> On 28/09/11 04:19, Hamedani, Masoud wrote:
>
>> Special Thanks for your help Arko,
>>
>> You mean in Hadoop, NameNode, DataNodes, JobTracker, TaskTrackers and all
>> the clusters should deployed on Linux machines???
>> We have lots of data (on windows OS) and code (written in C#) for data
>> mining, we wana to use Hadoop and make connection between
>> our existing systems and programs with it.
>> as you mentioned we should move all of our data to Linux systems, and
>> execute existing C# codes in Linux and only use windows for
>> development same as before.
>> Am I right?
>>
>>
> What is really meant is "nobody runs hadoop at scale on Windows".
>
> Specifically
>  -there's an expectation that there is a unix API you can exec
>  -some of the operations (e.g. how programs are exec()'d) are optimised for
> linux
>  -everyone tests on 50+ node clusters on Linux.
>
> Why Linux? Stable, low cost. And you can install it on your laptop/desktop
> and develop there too.
>
>
> Because everyone uses Linux (or possibly a genuine Unix system like
> Solaris), problems encountered in real systems get found on Linux and fixed.
>
> If you want to run a production Hadoop cluster on Windows, you are free to
> do so. Just be aware that you may be the first person to do so at scale, so
> you get to find problems first, you get to file the bugs -and because you
> are the only person with these problems and the ability to replicate them-
> you get to fix them.
>
> Nobody is going to say "oh, this patch is for Windows only use, we will
> reject it" -at least provided it doesn't have adverse effects on Linux/Unix.
> It's just that nobody else publicly runs Hadoop on Windows. A key step 1
> will be cross compiling all the native code to Windows, which on 0.23+ also
> means protocol buffers. Enjoy.
>
> Where you will find problems is that even on Win64, Hadoop can't directly
> load or run C# APPs or anything else written to compile against their
> managed runtime (I forget it's name). You will have to bridge via streaming,
> and take a performance hit.
>
> You could also try running the C# code under Mono on Linux; it may or may
> not work. Again, you get to find out and fix the problems -this time with
> the Mono project.
>
> -Steve
>


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic