[prev in list] [next in list] [prev in thread] [next in thread] 

List:       drill-dev
Subject:    Re: contribution
From:       Jacques Nadeau <jacques () apache ! org>
Date:       2013-03-13 16:12:10
Message-ID: CAKa9qDkVFnbMpsu2cw8Cx6qGST_5A-gUtJCSzmouZAnd3CTuQA () mail ! gmail ! com
[Download RAW message or body]


I'm working on a presentation that will better illustrate the layers.
 There are actually three key plans.  Thinking to date has been to break
the plans down into logical, physical and execution.  The third hasn't been
expressed well here and is entirely an internal domain to the execution
engine.  Following some classic methods: Logical expresses what we want to
do, Physical expresses how we want to do it (adding points of
parallelization but not specifying particular amounts of parallelization or
node by node assignments).  The execution engine is then responsible for
determining the amount of parallelization of a particular plan along with
system load (likely leveraging Berkeley's Sparrow work), task priority and
specific data locality information, building sub-dags to be assigned to
individual nodes and execute the plan.

So in the higher logical and physical levels, a single Scan and subsequent
ScanPOP should be okay...  (ScanROPs have a separate problems since they
ignore the level of separation we're planning for the real execution layer.
 This is the why the current ref impl turns a single Scan into potentially
a union of ScanROPs... not elegant but logically correct.)

The capabilities interface still needs to be defined for how a storage
engine reveals its logical capabilities and thus consumes part of the plan.

J


On Tue, Mar 12, 2013 at 10:19 PM, David Alves <davidralves@gmail.com> wrote:

> Hi Linsen
>
>         Some of what you are saying like push down of ops like filter,
> projection or partial aggregation below the storage engine scanner level,
> or sub tree execution are actively being discussed in issues DRILL-13
> (Strorage Engine Interface) and DRILL-15 (Hbase storage engine), your input
> in these issues is most welcome.
>
>         HBase in particular has the notion of
> enpoints/coprocessors/filters that allow pushing this down easily (this is
> also in line with what other parallel database over nosql implementations
> like tajo do).
>         A possible approach is to have the optimizer change the order of
> the ops to place them below the storage engine scanner and let the SE impl
> deal with it internally.
>
>         There are also some other pieces missing at the moment AFAIK, like
> a distributed metadata store, the drill daemons, wiring, etc.
>
>         So in summary, you're absolutely right, and if you're particularly
> interested in the HBase SE impl (as I am, for the moment) I'd be interested
> in collaborating.
>
> Best
> David
>
>
> On Mar 12, 2013, at 11:44 PM, Lisen Mu <immars@gmail.com> wrote:
>
> > Hi David,
> >
> > Very nice to see your effort on this.
> >
> > Hi Jacques,
> >
> > we are also extending drill prototype, to see if there is any chance to
> > meet our production need. However, We find that implementing a performant
> > HBase storage engine is a not so straight-forward work, and requires some
> > workaround. The problem is in Scan interface.
> >
> > In drill's physical plan model, ScanROP is in charge of table scan.
> Storage
> > engine provides output for a whole data source, a csv file for example.
> > It's sufficient for input source like plain file, but for hbase, it's not
> > very efficient, if not impossible, to let ScanROP retrieve a whole htable
> > into drill. Storage engines like HBase should have some ablility to do
> part
> > of the DrQL query, like Filter, if a filter can be performed by
> specifying
> > startRowKey and endRowKey. Storage engine like mysql could do more, even
> > Join.
> >
> > Generally, it would be more clear if a ScanROP is mapped to a sub-DAG of
> > logical plan DAG instead of a single Scan node in logical plan. If so,
> more
> > implementation-specific information would coupe into the plan
> optimization
> > & transformation phase. I guess that's the price to pay when optimization
> > comes, or is there other way I failed to see?
> >
> > Please correct me if anything is wrong.
> >
> > thanks,
> >
> > Lisen
> >
> >
> >
> > On Wed, Mar 13, 2013 at 9:33 AM, David Alves <davidralves@gmail.com>
> wrote:
> >
> >> Hi Jacques
> >>
> >>        I've submitted a fist pass patch to DRILL-15.
> >>        I did this mostly because HBase will be my main target and
> because
> >> I wanted to get a feel of what would be a nice interface for DRILL-13.
> Have
> >> some thoughts that I will post soon.
> >>        btw: I still can't assign issues to myself in JIRA, did you
> forget
> >> to add me as a contributor?
> >>
> >> Best
> >> David
> >>
> >> On Mar 11, 2013, at 2:13 PM, Jacques Nadeau <jacques@apache.org> wrote:
> >>
> >>> Hey David,
> >>>
> >>> These sound good.  I've add you as a contributor on jira so you can
> >> assign
> >>> tasks to yourself.  I think 45 and 46 are good places to start.  15
> >> depends
> >>> on 13 and working on the two hand in hand would probably be a good
> idea.
> >>> Maybe we could do a design discussion on 15 and 13 here once you have
> >> some
> >>> time to focus on it.
> >>>
> >>> Jacques
> >>>
> >>>
> >>> On Mon, Mar 11, 2013 at 3:02 AM, David Alves <davidralves@gmail.com>
> >> wrote:
> >>>
> >>>> Hi All
> >>>>
> >>>>       I have a new academic project for which I'd like to use drill
> >>>> since none of the other parallel database over hadoop/nosql
> >> implementations
> >>>> fit just right.
> >>>>       To this goal I've been tinkering with the prototype trying to
> >> find
> >>>> where I'd be most useful.
> >>>>
> >>>>       Here's where I'd like to start, if you agree:
> >>>>       - implement HBase storage engine (DRILL-15)
> >>>>               - start with simple scanning an push down of
> >>>> selection/projection
> >>>>       - implement the LogicalPlanBuilder (DRILL-45)
> >>>>       - setup coding style in the wiki (formatting/imports etc,
> >> DRILL-46)
> >>>>       - create builders for all logical plan elements/make logical
> >> plans
> >>>> immutable (no issue for this, I'd like to hear your thoughts first).
> >>>>
> >>>>       Please let me know your thoughts, and if you agree please assign
> >>>> the issues to me (it seems that I can't assign them myself).
> >>>>
> >>>> Best
> >>>> David Alves
> >>
> >>
>
>


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic