[prev in list] [next in list] [prev in thread] [next in thread] 

List:       drill-dev
Subject:    Re: contribution
From:       Jacques Nadeau <jacques () apache ! org>
Date:       2013-03-13 22:25:14
Message-ID: CAKa9qD=E-ocze0eh89nk7Ayq2a-4t6hF2OysmgFt=Qb-ZoKS9g () mail ! gmail ! com
[Download RAW message or body]


Don't worry Tim, it is still very much on my radar.  Just well ahead of the
ref interpreter stuff.  Let me see what I can slice up in the next few days.

J

On Wed, Mar 13, 2013 at 2:40 PM, Timothy Chen <tnachen@gmail.com> wrote:

> Looking forward to the plumbing as well, since my json scan op sat there
> for a while now :)
>
> Tim
>
>
> On Wed, Mar 13, 2013 at 2:30 PM, David Alves <davidralves@gmail.com>
> wrote:
>
> > Getting the basic plumbing to a point where we could work together on
> > it/use it elsewhere as soon as you can would be awesome.
> > As soon as I get that I can start on the daemons/scripts.
> > I'll  focus on the SE iface and on HBase pushdown for the moment.
> >
> > -david
> >
> > On Mar 13, 2013, at 3:12 PM, Jacques Nadeau <jacques@apache.org> wrote:
> >
> > > I'm working on some physical plan stuff as well as some basic plumbing
> > for
> > > distributed execution.  Its very in progress so I need to clean things
> > up a
> > > bit before we could collaborate/ divide and conquer on it.  Depending
> on
> > > your timing and availability, maybe I could put some of this together
> in
> > > the next couple days so that you could plug in rather than reinvent.
>  In
> > > the meantime, pushing forward the builder stuff, additional test cases
> on
> > > the reference interpreter and/or thinking through the logical plan
> > storage
> > > engine pushdown/rewrite could be very useful.
> > >
> > > Let me know your thoughts.
> > >
> > > thanks,
> > > Jacques
> > >
> > > On Wed, Mar 13, 2013 at 9:47 AM, David Alves <davidralves@gmail.com>
> > wrote:
> > >
> > >> Hi Jacques
> > >>
> > >>        I can assign issues to me now, thanks.
> > >>        What you say wrt to the logical/physical/execution layers
> sounds
> > >> good.
> > >>        My main concern, for the moment is to have something working as
> > >> fast as possible, i.e. some daemons that I'd be able to deploy to a
> > working
> > >> hbase cluster and send them work to do in some form (first step would
> > be to
> > >> treat is as a non distributed engine where each daemon runs an
> instance
> > of
> > >> the prototype).
> > >>        Here's where I'd like to go next:
> > >>        - lay the ground work for the daemons (scripts/rpc iface/wiring
> > >> protocol).
> > >>        - create an execution engine iface that allows to abstract
> future
> > >> implementations, and make it available through the rpc iface. this
> would
> > >> sit in front of the ref impl for now and would be replaced by cpp down
> > the
> > >> line.
> > >>
> > >>        I think we can probably concentrate on the capabilities iface a
> > >> bit down the line but, as a first approach, I see it simply providing
> a
> > >> simple set of ops that it is able to run internally.
> > >>        How to abstract locality/partitioning/schema capabilities is
> till
> > >> not clear to me though, thoughts?
> > >>
> > >> David
> > >>
> > >> On Mar 13, 2013, at 11:12 AM, Jacques Nadeau <jacques@apache.org>
> > wrote:
> > >>
> > >>> I'm working on a presentation that will better illustrate the layers.
> > >>> There are actually three key plans.  Thinking to date has been to
> break
> > >>> the plans down into logical, physical and execution.  The third
> hasn't
> > >> been
> > >>> expressed well here and is entirely an internal domain to the
> execution
> > >>> engine.  Following some classic methods: Logical expresses what we
> want
> > >> to
> > >>> do, Physical expresses how we want to do it (adding points of
> > >>> parallelization but not specifying particular amounts of
> > parallelization
> > >> or
> > >>> node by node assignments).  The execution engine is then responsible
> > for
> > >>> determining the amount of parallelization of a particular plan along
> > with
> > >>> system load (likely leveraging Berkeley's Sparrow work), task
> priority
> > >> and
> > >>> specific data locality information, building sub-dags to be assigned
> to
> > >>> individual nodes and execute the plan.
> > >>>
> > >>> So in the higher logical and physical levels, a single Scan and
> > >> subsequent
> > >>> ScanPOP should be okay...  (ScanROPs have a separate problems since
> > they
> > >>> ignore the level of separation we're planning for the real execution
> > >> layer.
> > >>> This is the why the current ref impl turns a single Scan into
> > potentially
> > >>> a union of ScanROPs... not elegant but logically correct.)
> > >>>
> > >>> The capabilities interface still needs to be defined for how a
> storage
> > >>> engine reveals its logical capabilities and thus consumes part of the
> > >> plan.
> > >>>
> > >>> J
> > >>>
> > >>>
> > >>> On Tue, Mar 12, 2013 at 10:19 PM, David Alves <davidralves@gmail.com
> >
> > >> wrote:
> > >>>
> > >>>> Hi Linsen
> > >>>>
> > >>>>       Some of what you are saying like push down of ops like filter,
> > >>>> projection or partial aggregation below the storage engine scanner
> > >> level,
> > >>>> or sub tree execution are actively being discussed in issues
> DRILL-13
> > >>>> (Strorage Engine Interface) and DRILL-15 (Hbase storage engine),
> your
> > >> input
> > >>>> in these issues is most welcome.
> > >>>>
> > >>>>       HBase in particular has the notion of
> > >>>> enpoints/coprocessors/filters that allow pushing this down easily
> > (this
> > >> is
> > >>>> also in line with what other parallel database over nosql
> > >> implementations
> > >>>> like tajo do).
> > >>>>       A possible approach is to have the optimizer change the order
> of
> > >>>> the ops to place them below the storage engine scanner and let the
> SE
> > >> impl
> > >>>> deal with it internally.
> > >>>>
> > >>>>       There are also some other pieces missing at the moment AFAIK,
> > >> like
> > >>>> a distributed metadata store, the drill daemons, wiring, etc.
> > >>>>
> > >>>>       So in summary, you're absolutely right, and if you're
> > >> particularly
> > >>>> interested in the HBase SE impl (as I am, for the moment) I'd be
> > >> interested
> > >>>> in collaborating.
> > >>>>
> > >>>> Best
> > >>>> David
> > >>>>
> > >>>>
> > >>>> On Mar 12, 2013, at 11:44 PM, Lisen Mu <immars@gmail.com> wrote:
> > >>>>
> > >>>>> Hi David,
> > >>>>>
> > >>>>> Very nice to see your effort on this.
> > >>>>>
> > >>>>> Hi Jacques,
> > >>>>>
> > >>>>> we are also extending drill prototype, to see if there is any
> chance
> > to
> > >>>>> meet our production need. However, We find that implementing a
> > >> performant
> > >>>>> HBase storage engine is a not so straight-forward work, and
> requires
> > >> some
> > >>>>> workaround. The problem is in Scan interface.
> > >>>>>
> > >>>>> In drill's physical plan model, ScanROP is in charge of table scan.
> > >>>> Storage
> > >>>>> engine provides output for a whole data source, a csv file for
> > example.
> > >>>>> It's sufficient for input source like plain file, but for hbase,
> it's
> > >> not
> > >>>>> very efficient, if not impossible, to let ScanROP retrieve a whole
> > >> htable
> > >>>>> into drill. Storage engines like HBase should have some ablility to
> > do
> > >>>> part
> > >>>>> of the DrQL query, like Filter, if a filter can be performed by
> > >>>> specifying
> > >>>>> startRowKey and endRowKey. Storage engine like mysql could do more,
> > >> even
> > >>>>> Join.
> > >>>>>
> > >>>>> Generally, it would be more clear if a ScanROP is mapped to a
> sub-DAG
> > >> of
> > >>>>> logical plan DAG instead of a single Scan node in logical plan. If
> > so,
> > >>>> more
> > >>>>> implementation-specific information would coupe into the plan
> > >>>> optimization
> > >>>>> & transformation phase. I guess that's the price to pay when
> > >> optimization
> > >>>>> comes, or is there other way I failed to see?
> > >>>>>
> > >>>>> Please correct me if anything is wrong.
> > >>>>>
> > >>>>> thanks,
> > >>>>>
> > >>>>> Lisen
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On Wed, Mar 13, 2013 at 9:33 AM, David Alves <
> davidralves@gmail.com>
> > >>>> wrote:
> > >>>>>
> > >>>>>> Hi Jacques
> > >>>>>>
> > >>>>>>      I've submitted a fist pass patch to DRILL-15.
> > >>>>>>      I did this mostly because HBase will be my main target and
> > >>>> because
> > >>>>>> I wanted to get a feel of what would be a nice interface for
> > DRILL-13.
> > >>>> Have
> > >>>>>> some thoughts that I will post soon.
> > >>>>>>      btw: I still can't assign issues to myself in JIRA, did you
> > >>>> forget
> > >>>>>> to add me as a contributor?
> > >>>>>>
> > >>>>>> Best
> > >>>>>> David
> > >>>>>>
> > >>>>>> On Mar 11, 2013, at 2:13 PM, Jacques Nadeau <jacques@apache.org>
> > >> wrote:
> > >>>>>>
> > >>>>>>> Hey David,
> > >>>>>>>
> > >>>>>>> These sound good.  I've add you as a contributor on jira so you
> can
> > >>>>>> assign
> > >>>>>>> tasks to yourself.  I think 45 and 46 are good places to start.
>  15
> > >>>>>> depends
> > >>>>>>> on 13 and working on the two hand in hand would probably be a
> good
> > >>>> idea.
> > >>>>>>> Maybe we could do a design discussion on 15 and 13 here once you
> > have
> > >>>>>> some
> > >>>>>>> time to focus on it.
> > >>>>>>>
> > >>>>>>> Jacques
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Mon, Mar 11, 2013 at 3:02 AM, David Alves <
> > davidralves@gmail.com>
> > >>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Hi All
> > >>>>>>>>
> > >>>>>>>>     I have a new academic project for which I'd like to use
> drill
> > >>>>>>>> since none of the other parallel database over hadoop/nosql
> > >>>>>> implementations
> > >>>>>>>> fit just right.
> > >>>>>>>>     To this goal I've been tinkering with the prototype trying
> to
> > >>>>>> find
> > >>>>>>>> where I'd be most useful.
> > >>>>>>>>
> > >>>>>>>>     Here's where I'd like to start, if you agree:
> > >>>>>>>>     - implement HBase storage engine (DRILL-15)
> > >>>>>>>>             - start with simple scanning an push down of
> > >>>>>>>> selection/projection
> > >>>>>>>>     - implement the LogicalPlanBuilder (DRILL-45)
> > >>>>>>>>     - setup coding style in the wiki (formatting/imports etc,
> > >>>>>> DRILL-46)
> > >>>>>>>>     - create builders for all logical plan elements/make logical
> > >>>>>> plans
> > >>>>>>>> immutable (no issue for this, I'd like to hear your thoughts
> > first).
> > >>>>>>>>
> > >>>>>>>>     Please let me know your thoughts, and if you agree please
> > >> assign
> > >>>>>>>> the issues to me (it seems that I can't assign them myself).
> > >>>>>>>>
> > >>>>>>>> Best
> > >>>>>>>> David Alves
> > >>>>>>
> > >>>>>>
> > >>>>
> > >>>>
> > >>
> > >>
> >
> >
>


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic