From drill-dev Wed Mar 13 04:44:42 2013 From: Lisen Mu Date: Wed, 13 Mar 2013 04:44:42 +0000 To: drill-dev Subject: Re: contribution Message-Id: X-MARC-Message: https://marc.info/?l=drill-dev&m=142065425129392 MIME-Version: 1 Content-Type: multipart/mixed; boundary="--e89a8f23465980c03804d7c71047" --e89a8f23465980c03804d7c71047 Content-Type: text/plain; charset=ISO-8859-1 Hi David, Very nice to see your effort on this. Hi Jacques, we are also extending drill prototype, to see if there is any chance to meet our production need. However, We find that implementing a performant HBase storage engine is a not so straight-forward work, and requires some workaround. The problem is in Scan interface. In drill's physical plan model, ScanROP is in charge of table scan. Storage engine provides output for a whole data source, a csv file for example. It's sufficient for input source like plain file, but for hbase, it's not very efficient, if not impossible, to let ScanROP retrieve a whole htable into drill. Storage engines like HBase should have some ablility to do part of the DrQL query, like Filter, if a filter can be performed by specifying startRowKey and endRowKey. Storage engine like mysql could do more, even Join. Generally, it would be more clear if a ScanROP is mapped to a sub-DAG of logical plan DAG instead of a single Scan node in logical plan. If so, more implementation-specific information would coupe into the plan optimization & transformation phase. I guess that's the price to pay when optimization comes, or is there other way I failed to see? Please correct me if anything is wrong. thanks, Lisen On Wed, Mar 13, 2013 at 9:33 AM, David Alves wrote: > Hi Jacques > > I've submitted a fist pass patch to DRILL-15. > I did this mostly because HBase will be my main target and because > I wanted to get a feel of what would be a nice interface for DRILL-13. Have > some thoughts that I will post soon. > btw: I still can't assign issues to myself in JIRA, did you forget > to add me as a contributor? > > Best > David > > On Mar 11, 2013, at 2:13 PM, Jacques Nadeau wrote: > > > Hey David, > > > > These sound good. I've add you as a contributor on jira so you can > assign > > tasks to yourself. I think 45 and 46 are good places to start. 15 > depends > > on 13 and working on the two hand in hand would probably be a good idea. > > Maybe we could do a design discussion on 15 and 13 here once you have > some > > time to focus on it. > > > > Jacques > > > > > > On Mon, Mar 11, 2013 at 3:02 AM, David Alves > wrote: > > > >> Hi All > >> > >> I have a new academic project for which I'd like to use drill > >> since none of the other parallel database over hadoop/nosql > implementations > >> fit just right. > >> To this goal I've been tinkering with the prototype trying to > find > >> where I'd be most useful. > >> > >> Here's where I'd like to start, if you agree: > >> - implement HBase storage engine (DRILL-15) > >> - start with simple scanning an push down of > >> selection/projection > >> - implement the LogicalPlanBuilder (DRILL-45) > >> - setup coding style in the wiki (formatting/imports etc, > DRILL-46) > >> - create builders for all logical plan elements/make logical > plans > >> immutable (no issue for this, I'd like to hear your thoughts first). > >> > >> Please let me know your thoughts, and if you agree please assign > >> the issues to me (it seems that I can't assign them myself). > >> > >> Best > >> David Alves > > --e89a8f23465980c03804d7c71047--