[prev in list] [next in list] [prev in thread] [next in thread] 

List:       drill-dev
Subject:    Re: contribution
From:       Lisen Mu <immars () gmail ! com>
Date:       2013-03-13 4:44:42
Message-ID: CAHgK2mFFhPLwVx3E2ZTT77gACyxurJ5anVCJaNWJS0zZt=defw () mail ! gmail ! com
[Download RAW message or body]


Hi David,

Very nice to see your effort on this.

Hi Jacques,

we are also extending drill prototype, to see if there is any chance to
meet our production need. However, We find that implementing a performant
HBase storage engine is a not so straight-forward work, and requires some
workaround. The problem is in Scan interface.

In drill's physical plan model, ScanROP is in charge of table scan. Storage
engine provides output for a whole data source, a csv file for example.
It's sufficient for input source like plain file, but for hbase, it's not
very efficient, if not impossible, to let ScanROP retrieve a whole htable
into drill. Storage engines like HBase should have some ablility to do part
of the DrQL query, like Filter, if a filter can be performed by specifying
startRowKey and endRowKey. Storage engine like mysql could do more, even
Join.

Generally, it would be more clear if a ScanROP is mapped to a sub-DAG of
logical plan DAG instead of a single Scan node in logical plan. If so, more
implementation-specific information would coupe into the plan optimization
& transformation phase. I guess that's the price to pay when optimization
comes, or is there other way I failed to see?

Please correct me if anything is wrong.

thanks,

Lisen



On Wed, Mar 13, 2013 at 9:33 AM, David Alves <davidralves@gmail.com> wrote:

> Hi Jacques
>
>         I've submitted a fist pass patch to DRILL-15.
>         I did this mostly because HBase will be my main target and because
> I wanted to get a feel of what would be a nice interface for DRILL-13. Have
> some thoughts that I will post soon.
>         btw: I still can't assign issues to myself in JIRA, did you forget
> to add me as a contributor?
>
> Best
> David
>
> On Mar 11, 2013, at 2:13 PM, Jacques Nadeau <jacques@apache.org> wrote:
>
> > Hey David,
> >
> > These sound good.  I've add you as a contributor on jira so you can
> assign
> > tasks to yourself.  I think 45 and 46 are good places to start.  15
> depends
> > on 13 and working on the two hand in hand would probably be a good idea.
> > Maybe we could do a design discussion on 15 and 13 here once you have
> some
> > time to focus on it.
> >
> > Jacques
> >
> >
> > On Mon, Mar 11, 2013 at 3:02 AM, David Alves <davidralves@gmail.com>
> wrote:
> >
> >> Hi All
> >>
> >>        I have a new academic project for which I'd like to use drill
> >> since none of the other parallel database over hadoop/nosql
> implementations
> >> fit just right.
> >>        To this goal I've been tinkering with the prototype trying to
> find
> >> where I'd be most useful.
> >>
> >>        Here's where I'd like to start, if you agree:
> >>        - implement HBase storage engine (DRILL-15)
> >>                - start with simple scanning an push down of
> >> selection/projection
> >>        - implement the LogicalPlanBuilder (DRILL-45)
> >>        - setup coding style in the wiki (formatting/imports etc,
> DRILL-46)
> >>        - create builders for all logical plan elements/make logical
> plans
> >> immutable (no issue for this, I'd like to hear your thoughts first).
> >>
> >>        Please let me know your thoughts, and if you agree please assign
> >> the issues to me (it seems that I can't assign them myself).
> >>
> >> Best
> >> David Alves
>
>


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic