[prev in list] [next in list] [prev in thread] [next in thread] 

List:       drill-dev
Subject:    Re: contribution
From:       Jacques Nadeau <jacques () apache ! org>
Date:       2013-03-24 0:13:57
Message-ID: CAKa9qDkfD42-wvyHgHd78soj4pt2SFgaKFSBW7V8w2h-S9c6rA () mail ! gmail ! com
[Download RAW message or body]


Not yet.  I will share as soon as I get something cohesive together.

Thanks,
Jacques

On Fri, Mar 22, 2013 at 12:06 PM, David Alves <davidralves@gmail.com> wrote:

> Hey Jacques
>
>         Sorry to be a nag, but is there any change to take a sneak peak at
> the protobuf rpc stuff?
>         I'd really like hack something together wrt to the daemon this
> weekend.
>         Also, wrt to configuration management (zk/helix) maybe you could
> post the iface so that it'd be possible to hack something static (i.e.
> non-ft, properties file based) just to make dist execution work.
>
> Thanks
> David
>
> On Mar 16, 2013, at 8:34 PM, Jacques Nadeau <jacques@apache.org> wrote:
>
> > Hey David,
> >
> > The java-exec framework is not far enough along that it makes sense for
> me
> > to push it externally yet.  However, I did push my initial wip physical
> > plan approach.  You can find it here:
> > https://github.com/jacques-n/incubator-drill/tree/physical_plan_updates
> >
> > Hopefully, I will get further along on the java-exec stuff soon.
> >
> > I'd suggest that you focus your energy on the StorageEngine API and HBase
> > implementation.  If you're up for it, let's do a quick skype chat to sync
> > up.  Let me know your availability over the next few days.
> >
> > Thanks,
> > Jacques
> >
> >
> >
> > On Fri, Mar 15, 2013 at 6:59 PM, David Alves <davidralves@gmail.com>
> wrote:
> >
> >> that'd be great thanks.
> >>
> >> -david
> >>
> >> On Mar 15, 2013, at 8:51 PM, Jacques Nadeau <jacques.drill@gmail.com>
> >> wrote:
> >>
> >>> I've been under the weather the last few days and haven't made much
> >>> progress. Let me see if I can get you something tomorrow.
> >>>
> >>> On Mar 15, 2013, at 2:36 PM, David Alves <davidralves@gmail.com>
> wrote:
> >>>
> >>>> Hi Jacques
> >>>>
> >>>>  Is there any chance we could get a preview of this physical plan
> >> stuff and basic plumbing for distributed execution before the weekend?
> >> maybe in a github branch somewhere?
> >>>>  I mean it doesn't have to be complete or even running, I'd just like
> >> to make some progress with other stuff and keeping it in line with
> >> whichever plumbing you already have would be great.
> >>>>
> >>>> Best
> >>>> David
> >>>>
> >>>> On Mar 13, 2013, at 3:12 PM, Jacques Nadeau <jacques@apache.org>
> wrote:
> >>>>
> >>>>> I'm working on some physical plan stuff as well as some basic
> plumbing
> >> for
> >>>>> distributed execution.  Its very in progress so I need to clean
> things
> >> up a
> >>>>> bit before we could collaborate/ divide and conquer on it.  Depending
> >> on
> >>>>> your timing and availability, maybe I could put some of this together
> >> in
> >>>>> the next couple days so that you could plug in rather than reinvent.
> >> In
> >>>>> the meantime, pushing forward the builder stuff, additional test
> cases
> >> on
> >>>>> the reference interpreter and/or thinking through the logical plan
> >> storage
> >>>>> engine pushdown/rewrite could be very useful.
> >>>>>
> >>>>> Let me know your thoughts.
> >>>>>
> >>>>> thanks,
> >>>>> Jacques
> >>>>>
> >>>>> On Wed, Mar 13, 2013 at 9:47 AM, David Alves <davidralves@gmail.com>
> >> wrote:
> >>>>>
> >>>>>> Hi Jacques
> >>>>>>
> >>>>>>     I can assign issues to me now, thanks.
> >>>>>>     What you say wrt to the logical/physical/execution layers sounds
> >>>>>> good.
> >>>>>>     My main concern, for the moment is to have something working as
> >>>>>> fast as possible, i.e. some daemons that I'd be able to deploy to a
> >> working
> >>>>>> hbase cluster and send them work to do in some form (first step
> would
> >> be to
> >>>>>> treat is as a non distributed engine where each daemon runs an
> >> instance of
> >>>>>> the prototype).
> >>>>>>     Here's where I'd like to go next:
> >>>>>>     - lay the ground work for the daemons (scripts/rpc iface/wiring
> >>>>>> protocol).
> >>>>>>     - create an execution engine iface that allows to abstract
> future
> >>>>>> implementations, and make it available through the rpc iface. this
> >> would
> >>>>>> sit in front of the ref impl for now and would be replaced by cpp
> >> down the
> >>>>>> line.
> >>>>>>
> >>>>>>     I think we can probably concentrate on the capabilities iface a
> >>>>>> bit down the line but, as a first approach, I see it simply
> providing
> >> a
> >>>>>> simple set of ops that it is able to run internally.
> >>>>>>     How to abstract locality/partitioning/schema capabilities is
> till
> >>>>>> not clear to me though, thoughts?
> >>>>>>
> >>>>>> David
> >>>>>>
> >>>>>> On Mar 13, 2013, at 11:12 AM, Jacques Nadeau <jacques@apache.org>
> >> wrote:
> >>>>>>
> >>>>>>> I'm working on a presentation that will better illustrate the
> layers.
> >>>>>>> There are actually three key plans.  Thinking to date has been to
> >> break
> >>>>>>> the plans down into logical, physical and execution.  The third
> >> hasn't
> >>>>>> been
> >>>>>>> expressed well here and is entirely an internal domain to the
> >> execution
> >>>>>>> engine.  Following some classic methods: Logical expresses what we
> >> want
> >>>>>> to
> >>>>>>> do, Physical expresses how we want to do it (adding points of
> >>>>>>> parallelization but not specifying particular amounts of
> >> parallelization
> >>>>>> or
> >>>>>>> node by node assignments).  The execution engine is then
> responsible
> >> for
> >>>>>>> determining the amount of parallelization of a particular plan
> along
> >> with
> >>>>>>> system load (likely leveraging Berkeley's Sparrow work), task
> >> priority
> >>>>>> and
> >>>>>>> specific data locality information, building sub-dags to be
> assigned
> >> to
> >>>>>>> individual nodes and execute the plan.
> >>>>>>>
> >>>>>>> So in the higher logical and physical levels, a single Scan and
> >>>>>> subsequent
> >>>>>>> ScanPOP should be okay...  (ScanROPs have a separate problems since
> >> they
> >>>>>>> ignore the level of separation we're planning for the real
> execution
> >>>>>> layer.
> >>>>>>> This is the why the current ref impl turns a single Scan into
> >> potentially
> >>>>>>> a union of ScanROPs... not elegant but logically correct.)
> >>>>>>>
> >>>>>>> The capabilities interface still needs to be defined for how a
> >> storage
> >>>>>>> engine reveals its logical capabilities and thus consumes part of
> the
> >>>>>> plan.
> >>>>>>>
> >>>>>>> J
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Mar 12, 2013 at 10:19 PM, David Alves <
> davidralves@gmail.com
> >>>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi Linsen
> >>>>>>>>
> >>>>>>>>    Some of what you are saying like push down of ops like filter,
> >>>>>>>> projection or partial aggregation below the storage engine scanner
> >>>>>> level,
> >>>>>>>> or sub tree execution are actively being discussed in issues
> >> DRILL-13
> >>>>>>>> (Strorage Engine Interface) and DRILL-15 (Hbase storage engine),
> >> your
> >>>>>> input
> >>>>>>>> in these issues is most welcome.
> >>>>>>>>
> >>>>>>>>    HBase in particular has the notion of
> >>>>>>>> enpoints/coprocessors/filters that allow pushing this down easily
> >> (this
> >>>>>> is
> >>>>>>>> also in line with what other parallel database over nosql
> >>>>>> implementations
> >>>>>>>> like tajo do).
> >>>>>>>>    A possible approach is to have the optimizer change the order
> of
> >>>>>>>> the ops to place them below the storage engine scanner and let the
> >> SE
> >>>>>> impl
> >>>>>>>> deal with it internally.
> >>>>>>>>
> >>>>>>>>    There are also some other pieces missing at the moment AFAIK,
> >>>>>> like
> >>>>>>>> a distributed metadata store, the drill daemons, wiring, etc.
> >>>>>>>>
> >>>>>>>>    So in summary, you're absolutely right, and if you're
> >>>>>> particularly
> >>>>>>>> interested in the HBase SE impl (as I am, for the moment) I'd be
> >>>>>> interested
> >>>>>>>> in collaborating.
> >>>>>>>>
> >>>>>>>> Best
> >>>>>>>> David
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Mar 12, 2013, at 11:44 PM, Lisen Mu <immars@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>>> Hi David,
> >>>>>>>>>
> >>>>>>>>> Very nice to see your effort on this.
> >>>>>>>>>
> >>>>>>>>> Hi Jacques,
> >>>>>>>>>
> >>>>>>>>> we are also extending drill prototype, to see if there is any
> >> chance to
> >>>>>>>>> meet our production need. However, We find that implementing a
> >>>>>> performant
> >>>>>>>>> HBase storage engine is a not so straight-forward work, and
> >> requires
> >>>>>> some
> >>>>>>>>> workaround. The problem is in Scan interface.
> >>>>>>>>>
> >>>>>>>>> In drill's physical plan model, ScanROP is in charge of table
> scan.
> >>>>>>>> Storage
> >>>>>>>>> engine provides output for a whole data source, a csv file for
> >> example.
> >>>>>>>>> It's sufficient for input source like plain file, but for hbase,
> >> it's
> >>>>>> not
> >>>>>>>>> very efficient, if not impossible, to let ScanROP retrieve a
> whole
> >>>>>> htable
> >>>>>>>>> into drill. Storage engines like HBase should have some ablility
> >> to do
> >>>>>>>> part
> >>>>>>>>> of the DrQL query, like Filter, if a filter can be performed by
> >>>>>>>> specifying
> >>>>>>>>> startRowKey and endRowKey. Storage engine like mysql could do
> more,
> >>>>>> even
> >>>>>>>>> Join.
> >>>>>>>>>
> >>>>>>>>> Generally, it would be more clear if a ScanROP is mapped to a
> >> sub-DAG
> >>>>>> of
> >>>>>>>>> logical plan DAG instead of a single Scan node in logical plan.
> If
> >> so,
> >>>>>>>> more
> >>>>>>>>> implementation-specific information would coupe into the plan
> >>>>>>>> optimization
> >>>>>>>>> & transformation phase. I guess that's the price to pay when
> >>>>>> optimization
> >>>>>>>>> comes, or is there other way I failed to see?
> >>>>>>>>>
> >>>>>>>>> Please correct me if anything is wrong.
> >>>>>>>>>
> >>>>>>>>> thanks,
> >>>>>>>>>
> >>>>>>>>> Lisen
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Wed, Mar 13, 2013 at 9:33 AM, David Alves <
> >> davidralves@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi Jacques
> >>>>>>>>>>
> >>>>>>>>>>   I've submitted a fist pass patch to DRILL-15.
> >>>>>>>>>>   I did this mostly because HBase will be my main target and
> >>>>>>>> because
> >>>>>>>>>> I wanted to get a feel of what would be a nice interface for
> >> DRILL-13.
> >>>>>>>> Have
> >>>>>>>>>> some thoughts that I will post soon.
> >>>>>>>>>>   btw: I still can't assign issues to myself in JIRA, did you
> >>>>>>>> forget
> >>>>>>>>>> to add me as a contributor?
> >>>>>>>>>>
> >>>>>>>>>> Best
> >>>>>>>>>> David
> >>>>>>>>>>
> >>>>>>>>>> On Mar 11, 2013, at 2:13 PM, Jacques Nadeau <jacques@apache.org
> >
> >>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hey David,
> >>>>>>>>>>>
> >>>>>>>>>>> These sound good.  I've add you as a contributor on jira so you
> >> can
> >>>>>>>>>> assign
> >>>>>>>>>>> tasks to yourself.  I think 45 and 46 are good places to start.
> >> 15
> >>>>>>>>>> depends
> >>>>>>>>>>> on 13 and working on the two hand in hand would probably be a
> >> good
> >>>>>>>> idea.
> >>>>>>>>>>> Maybe we could do a design discussion on 15 and 13 here once
> you
> >> have
> >>>>>>>>>> some
> >>>>>>>>>>> time to focus on it.
> >>>>>>>>>>>
> >>>>>>>>>>> Jacques
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Mar 11, 2013 at 3:02 AM, David Alves <
> >> davidralves@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi All
> >>>>>>>>>>>>
> >>>>>>>>>>>>  I have a new academic project for which I'd like to use drill
> >>>>>>>>>>>> since none of the other parallel database over hadoop/nosql
> >>>>>>>>>> implementations
> >>>>>>>>>>>> fit just right.
> >>>>>>>>>>>>  To this goal I've been tinkering with the prototype trying to
> >>>>>>>>>> find
> >>>>>>>>>>>> where I'd be most useful.
> >>>>>>>>>>>>
> >>>>>>>>>>>>  Here's where I'd like to start, if you agree:
> >>>>>>>>>>>>  - implement HBase storage engine (DRILL-15)
> >>>>>>>>>>>>          - start with simple scanning an push down of
> >>>>>>>>>>>> selection/projection
> >>>>>>>>>>>>  - implement the LogicalPlanBuilder (DRILL-45)
> >>>>>>>>>>>>  - setup coding style in the wiki (formatting/imports etc,
> >>>>>>>>>> DRILL-46)
> >>>>>>>>>>>>  - create builders for all logical plan elements/make logical
> >>>>>>>>>> plans
> >>>>>>>>>>>> immutable (no issue for this, I'd like to hear your thoughts
> >> first).
> >>>>>>>>>>>>
> >>>>>>>>>>>>  Please let me know your thoughts, and if you agree please
> >>>>>> assign
> >>>>>>>>>>>> the issues to me (it seems that I can't assign them myself).
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best
> >>>>>>>>>>>> David Alves
> >>>>
> >>
> >>
>
>


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic