[prev in list] [next in list] [prev in thread] [next in thread] 

List:       drill-dev
Subject:    Re: contribution
From:       Lisen Mu <immars () gmail ! com>
Date:       2013-03-13 6:23:20
Message-ID: CAHgK2mEukoKrxyz0Cx_ZEM=+gbNd-ckdB559XRj8PnspfbRGmQ () mail ! gmail ! com
[Download RAW message or body]


David,

That's great, you are making point more clear in DRILL-13. Would come back
there when I get more clue on this.






On Wed, Mar 13, 2013 at 1:19 PM, David Alves <davidralves@gmail.com> wrote:

> Hi Linsen
>
>         Some of what you are saying like push down of ops like filter,
> projection or partial aggregation below the storage engine scanner level,
> or sub tree execution are actively being discussed in issues DRILL-13
> (Strorage Engine Interface) and DRILL-15 (Hbase storage engine), your input
> in these issues is most welcome.
>
>         HBase in particular has the notion of
> enpoints/coprocessors/filters that allow pushing this down easily (this is
> also in line with what other parallel database over nosql implementations
> like tajo do).
>         A possible approach is to have the optimizer change the order of
> the ops to place them below the storage engine scanner and let the SE impl
> deal with it internally.
>
>         There are also some other pieces missing at the moment AFAIK, like
> a distributed metadata store, the drill daemons, wiring, etc.
>
>         So in summary, you're absolutely right, and if you're particularly
> interested in the HBase SE impl (as I am, for the moment) I'd be interested
> in collaborating.
>
> Best
> David
>
>
> On Mar 12, 2013, at 11:44 PM, Lisen Mu <immars@gmail.com> wrote:
>
> > Hi David,
> >
> > Very nice to see your effort on this.
> >
> > Hi Jacques,
> >
> > we are also extending drill prototype, to see if there is any chance to
> > meet our production need. However, We find that implementing a performant
> > HBase storage engine is a not so straight-forward work, and requires some
> > workaround. The problem is in Scan interface.
> >
> > In drill's physical plan model, ScanROP is in charge of table scan.
> Storage
> > engine provides output for a whole data source, a csv file for example.
> > It's sufficient for input source like plain file, but for hbase, it's not
> > very efficient, if not impossible, to let ScanROP retrieve a whole htable
> > into drill. Storage engines like HBase should have some ablility to do
> part
> > of the DrQL query, like Filter, if a filter can be performed by
> specifying
> > startRowKey and endRowKey. Storage engine like mysql could do more, even
> > Join.
> >
> > Generally, it would be more clear if a ScanROP is mapped to a sub-DAG of
> > logical plan DAG instead of a single Scan node in logical plan. If so,
> more
> > implementation-specific information would coupe into the plan
> optimization
> > & transformation phase. I guess that's the price to pay when optimization
> > comes, or is there other way I failed to see?
> >
> > Please correct me if anything is wrong.
> >
> > thanks,
> >
> > Lisen
> >
> >
> >
> > On Wed, Mar 13, 2013 at 9:33 AM, David Alves <davidralves@gmail.com>
> wrote:
> >
> >> Hi Jacques
> >>
> >>        I've submitted a fist pass patch to DRILL-15.
> >>        I did this mostly because HBase will be my main target and
> because
> >> I wanted to get a feel of what would be a nice interface for DRILL-13.
> Have
> >> some thoughts that I will post soon.
> >>        btw: I still can't assign issues to myself in JIRA, did you
> forget
> >> to add me as a contributor?
> >>
> >> Best
> >> David
> >>
> >> On Mar 11, 2013, at 2:13 PM, Jacques Nadeau <jacques@apache.org> wrote:
> >>
> >>> Hey David,
> >>>
> >>> These sound good.  I've add you as a contributor on jira so you can
> >> assign
> >>> tasks to yourself.  I think 45 and 46 are good places to start.  15
> >> depends
> >>> on 13 and working on the two hand in hand would probably be a good
> idea.
> >>> Maybe we could do a design discussion on 15 and 13 here once you have
> >> some
> >>> time to focus on it.
> >>>
> >>> Jacques
> >>>
> >>>
> >>> On Mon, Mar 11, 2013 at 3:02 AM, David Alves <davidralves@gmail.com>
> >> wrote:
> >>>
> >>>> Hi All
> >>>>
> >>>>       I have a new academic project for which I'd like to use drill
> >>>> since none of the other parallel database over hadoop/nosql
> >> implementations
> >>>> fit just right.
> >>>>       To this goal I've been tinkering with the prototype trying to
> >> find
> >>>> where I'd be most useful.
> >>>>
> >>>>       Here's where I'd like to start, if you agree:
> >>>>       - implement HBase storage engine (DRILL-15)
> >>>>               - start with simple scanning an push down of
> >>>> selection/projection
> >>>>       - implement the LogicalPlanBuilder (DRILL-45)
> >>>>       - setup coding style in the wiki (formatting/imports etc,
> >> DRILL-46)
> >>>>       - create builders for all logical plan elements/make logical
> >> plans
> >>>> immutable (no issue for this, I'd like to hear your thoughts first).
> >>>>
> >>>>       Please let me know your thoughts, and if you agree please assign
> >>>> the issues to me (it seems that I can't assign them myself).
> >>>>
> >>>> Best
> >>>> David Alves
> >>
> >>
>
>


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic