Hi Linsen Some of what you are saying like push down of ops like filter, = projection or partial aggregation below the storage engine scanner = level, or sub tree execution are actively being discussed in issues = DRILL-13 (Strorage Engine Interface) and DRILL-15 (Hbase storage = engine), your input in these issues is most welcome. HBase in particular has the notion of = enpoints/coprocessors/filters that allow pushing this down easily (this = is also in line with what other parallel database over nosql = implementations like tajo do). A possible approach is to have the optimizer change the order of = the ops to place them below the storage engine scanner and let the SE = impl deal with it internally. There are also some other pieces missing at the moment AFAIK, = like a distributed metadata store, the drill daemons, wiring, etc. So in summary, you're absolutely right, and if you're = particularly interested in the HBase SE impl (as I am, for the moment) = I'd be interested in collaborating. Best David =09 On Mar 12, 2013, at 11:44 PM, Lisen Mu wrote: > Hi David, >=20 > Very nice to see your effort on this. >=20 > Hi Jacques, >=20 > we are also extending drill prototype, to see if there is any chance = to > meet our production need. However, We find that implementing a = performant > HBase storage engine is a not so straight-forward work, and requires = some > workaround. The problem is in Scan interface. >=20 > In drill's physical plan model, ScanROP is in charge of table scan. = Storage > engine provides output for a whole data source, a csv file for = example. > It's sufficient for input source like plain file, but for hbase, it's = not > very efficient, if not impossible, to let ScanROP retrieve a whole = htable > into drill. Storage engines like HBase should have some ablility to do = part > of the DrQL query, like Filter, if a filter can be performed by = specifying > startRowKey and endRowKey. Storage engine like mysql could do more, = even > Join. >=20 > Generally, it would be more clear if a ScanROP is mapped to a sub-DAG = of > logical plan DAG instead of a single Scan node in logical plan. If so, = more > implementation-specific information would coupe into the plan = optimization > & transformation phase. I guess that's the price to pay when = optimization > comes, or is there other way I failed to see? >=20 > Please correct me if anything is wrong. >=20 > thanks, >=20 > Lisen >=20 >=20 >=20 > On Wed, Mar 13, 2013 at 9:33 AM, David Alves = wrote: >=20 >> Hi Jacques >>=20 >> I've submitted a fist pass patch to DRILL-15. >> I did this mostly because HBase will be my main target and = because >> I wanted to get a feel of what would be a nice interface for = DRILL-13. Have >> some thoughts that I will post soon. >> btw: I still can't assign issues to myself in JIRA, did you = forget >> to add me as a contributor? >>=20 >> Best >> David >>=20 >> On Mar 11, 2013, at 2:13 PM, Jacques Nadeau = wrote: >>=20 >>> Hey David, >>>=20 >>> These sound good. I've add you as a contributor on jira so you can >> assign >>> tasks to yourself. I think 45 and 46 are good places to start. 15 >> depends >>> on 13 and working on the two hand in hand would probably be a good = idea. >>> Maybe we could do a design discussion on 15 and 13 here once you = have >> some >>> time to focus on it. >>>=20 >>> Jacques >>>=20 >>>=20 >>> On Mon, Mar 11, 2013 at 3:02 AM, David Alves >> wrote: >>>=20 >>>> Hi All >>>>=20 >>>> I have a new academic project for which I'd like to use drill >>>> since none of the other parallel database over hadoop/nosql >> implementations >>>> fit just right. >>>> To this goal I've been tinkering with the prototype trying to >> find >>>> where I'd be most useful. >>>>=20 >>>> Here's where I'd like to start, if you agree: >>>> - implement HBase storage engine (DRILL-15) >>>> - start with simple scanning an push down of >>>> selection/projection >>>> - implement the LogicalPlanBuilder (DRILL-45) >>>> - setup coding style in the wiki (formatting/imports etc, >> DRILL-46) >>>> - create builders for all logical plan elements/make logical >> plans >>>> immutable (no issue for this, I'd like to hear your thoughts = first). >>>>=20 >>>> Please let me know your thoughts, and if you agree please = assign >>>> the issues to me (it seems that I can't assign them myself). >>>>=20 >>>> Best >>>> David Alves >>=20 >>=20