--f46d0442885e43c61704d7c871e7 Content-Type: text/plain; charset=ISO-8859-1 David, That's great, you are making point more clear in DRILL-13. Would come back there when I get more clue on this. On Wed, Mar 13, 2013 at 1:19 PM, David Alves wrote: > Hi Linsen > > Some of what you are saying like push down of ops like filter, > projection or partial aggregation below the storage engine scanner level, > or sub tree execution are actively being discussed in issues DRILL-13 > (Strorage Engine Interface) and DRILL-15 (Hbase storage engine), your input > in these issues is most welcome. > > HBase in particular has the notion of > enpoints/coprocessors/filters that allow pushing this down easily (this is > also in line with what other parallel database over nosql implementations > like tajo do). > A possible approach is to have the optimizer change the order of > the ops to place them below the storage engine scanner and let the SE impl > deal with it internally. > > There are also some other pieces missing at the moment AFAIK, like > a distributed metadata store, the drill daemons, wiring, etc. > > So in summary, you're absolutely right, and if you're particularly > interested in the HBase SE impl (as I am, for the moment) I'd be interested > in collaborating. > > Best > David > > > On Mar 12, 2013, at 11:44 PM, Lisen Mu wrote: > > > Hi David, > > > > Very nice to see your effort on this. > > > > Hi Jacques, > > > > we are also extending drill prototype, to see if there is any chance to > > meet our production need. However, We find that implementing a performant > > HBase storage engine is a not so straight-forward work, and requires some > > workaround. The problem is in Scan interface. > > > > In drill's physical plan model, ScanROP is in charge of table scan. > Storage > > engine provides output for a whole data source, a csv file for example. > > It's sufficient for input source like plain file, but for hbase, it's not > > very efficient, if not impossible, to let ScanROP retrieve a whole htable > > into drill. Storage engines like HBase should have some ablility to do > part > > of the DrQL query, like Filter, if a filter can be performed by > specifying > > startRowKey and endRowKey. Storage engine like mysql could do more, even > > Join. > > > > Generally, it would be more clear if a ScanROP is mapped to a sub-DAG of > > logical plan DAG instead of a single Scan node in logical plan. If so, > more > > implementation-specific information would coupe into the plan > optimization > > & transformation phase. I guess that's the price to pay when optimization > > comes, or is there other way I failed to see? > > > > Please correct me if anything is wrong. > > > > thanks, > > > > Lisen > > > > > > > > On Wed, Mar 13, 2013 at 9:33 AM, David Alves > wrote: > > > >> Hi Jacques > >> > >> I've submitted a fist pass patch to DRILL-15. > >> I did this mostly because HBase will be my main target and > because > >> I wanted to get a feel of what would be a nice interface for DRILL-13. > Have > >> some thoughts that I will post soon. > >> btw: I still can't assign issues to myself in JIRA, did you > forget > >> to add me as a contributor? > >> > >> Best > >> David > >> > >> On Mar 11, 2013, at 2:13 PM, Jacques Nadeau wrote: > >> > >>> Hey David, > >>> > >>> These sound good. I've add you as a contributor on jira so you can > >> assign > >>> tasks to yourself. I think 45 and 46 are good places to start. 15 > >> depends > >>> on 13 and working on the two hand in hand would probably be a good > idea. > >>> Maybe we could do a design discussion on 15 and 13 here once you have > >> some > >>> time to focus on it. > >>> > >>> Jacques > >>> > >>> > >>> On Mon, Mar 11, 2013 at 3:02 AM, David Alves > >> wrote: > >>> > >>>> Hi All > >>>> > >>>> I have a new academic project for which I'd like to use drill > >>>> since none of the other parallel database over hadoop/nosql > >> implementations > >>>> fit just right. > >>>> To this goal I've been tinkering with the prototype trying to > >> find > >>>> where I'd be most useful. > >>>> > >>>> Here's where I'd like to start, if you agree: > >>>> - implement HBase storage engine (DRILL-15) > >>>> - start with simple scanning an push down of > >>>> selection/projection > >>>> - implement the LogicalPlanBuilder (DRILL-45) > >>>> - setup coding style in the wiki (formatting/imports etc, > >> DRILL-46) > >>>> - create builders for all logical plan elements/make logical > >> plans > >>>> immutable (no issue for this, I'd like to hear your thoughts first). > >>>> > >>>> Please let me know your thoughts, and if you agree please assign > >>>> the issues to me (it seems that I can't assign them myself). > >>>> > >>>> Best > >>>> David Alves > >> > >> > > --f46d0442885e43c61704d7c871e7--