'Re: contribution'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       drill-dev
Subject:    Re: contribution
From:       David Alves <davidralves () gmail ! com>
Date:       2013-03-13 5:19:32
Message-ID: 35C8E250-24D7-4279-8A57-95B5550D590D () gmail ! com
[Download RAW message or body]

Hi Linsen

	Some of what you are saying like push down of ops like filter, projection or partial \
aggregation below the storage engine scanner level, or sub tree execution are \
actively being discussed in issues DRILL-13 (Strorage Engine Interface) and DRILL-15 \
(Hbase storage engine), your input in these issues is most welcome.

	HBase in particular has the notion of enpoints/coprocessors/filters that allow \
pushing this down easily (this is also in line with what other parallel database over \
nosql implementations like tajo do).  A possible approach is to have the optimizer \
change the order of the ops to place them below the storage engine scanner and let \
the SE impl deal with it internally.

	There are also some other pieces missing at the moment AFAIK, like a distributed \
metadata store, the drill daemons, wiring, etc.

	So in summary, you're absolutely right, and if you're particularly interested in the \
HBase SE impl (as I am, for the moment) I'd be interested in collaborating.

Best
David

	
On Mar 12, 2013, at 11:44 PM, Lisen Mu <immars@gmail.com> wrote:

> Hi David,
> 
> Very nice to see your effort on this.
> 
> Hi Jacques,
> 
> we are also extending drill prototype, to see if there is any chance to
> meet our production need. However, We find that implementing a performant
> HBase storage engine is a not so straight-forward work, and requires some
> workaround. The problem is in Scan interface.
> 
> In drill's physical plan model, ScanROP is in charge of table scan. Storage
> engine provides output for a whole data source, a csv file for example.
> It's sufficient for input source like plain file, but for hbase, it's not
> very efficient, if not impossible, to let ScanROP retrieve a whole htable
> into drill. Storage engines like HBase should have some ablility to do part
> of the DrQL query, like Filter, if a filter can be performed by specifying
> startRowKey and endRowKey. Storage engine like mysql could do more, even
> Join.
> 
> Generally, it would be more clear if a ScanROP is mapped to a sub-DAG of
> logical plan DAG instead of a single Scan node in logical plan. If so, more
> implementation-specific information would coupe into the plan optimization
> & transformation phase. I guess that's the price to pay when optimization
> comes, or is there other way I failed to see?
> 
> Please correct me if anything is wrong.
> 
> thanks,
> 
> Lisen
> 
> 
> 
> On Wed, Mar 13, 2013 at 9:33 AM, David Alves <davidralves@gmail.com> wrote:
> 
> > Hi Jacques
> > 
> > I've submitted a fist pass patch to DRILL-15.
> > I did this mostly because HBase will be my main target and because
> > I wanted to get a feel of what would be a nice interface for DRILL-13. Have
> > some thoughts that I will post soon.
> > btw: I still can't assign issues to myself in JIRA, did you forget
> > to add me as a contributor?
> > 
> > Best
> > David
> > 
> > On Mar 11, 2013, at 2:13 PM, Jacques Nadeau <jacques@apache.org> wrote:
> > 
> > > Hey David,
> > > 
> > > These sound good.  I've add you as a contributor on jira so you can
> > assign
> > > tasks to yourself.  I think 45 and 46 are good places to start.  15
> > depends
> > > on 13 and working on the two hand in hand would probably be a good idea.
> > > Maybe we could do a design discussion on 15 and 13 here once you have
> > some
> > > time to focus on it.
> > > 
> > > Jacques
> > > 
> > > 
> > > On Mon, Mar 11, 2013 at 3:02 AM, David Alves <davidralves@gmail.com>
> > wrote:
> > > 
> > > > Hi All
> > > > 
> > > > I have a new academic project for which I'd like to use drill
> > > > since none of the other parallel database over hadoop/nosql
> > implementations
> > > > fit just right.
> > > > To this goal I've been tinkering with the prototype trying to
> > find
> > > > where I'd be most useful.
> > > > 
> > > > Here's where I'd like to start, if you agree:
> > > > - implement HBase storage engine (DRILL-15)
> > > > - start with simple scanning an push down of
> > > > selection/projection
> > > > - implement the LogicalPlanBuilder (DRILL-45)
> > > > - setup coding style in the wiki (formatting/imports etc,
> > DRILL-46)
> > > > - create builders for all logical plan elements/make logical
> > plans
> > > > immutable (no issue for this, I'd like to hear your thoughts first).
> > > > 
> > > > Please let me know your thoughts, and if you agree please assign
> > > > the issues to me (it seems that I can't assign them myself).
> > > > 
> > > > Best
> > > > David Alves
> > 
> > 


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic