--bcaec519644b31845604d814e0a5 Content-Type: text/plain; charset=ISO-8859-1 Hey David, The java-exec framework is not far enough along that it makes sense for me to push it externally yet. However, I did push my initial wip physical plan approach. You can find it here: https://github.com/jacques-n/incubator-drill/tree/physical_plan_updates Hopefully, I will get further along on the java-exec stuff soon. I'd suggest that you focus your energy on the StorageEngine API and HBase implementation. If you're up for it, let's do a quick skype chat to sync up. Let me know your availability over the next few days. Thanks, Jacques On Fri, Mar 15, 2013 at 6:59 PM, David Alves wrote: > that'd be great thanks. > > -david > > On Mar 15, 2013, at 8:51 PM, Jacques Nadeau > wrote: > > > I've been under the weather the last few days and haven't made much > > progress. Let me see if I can get you something tomorrow. > > > > On Mar 15, 2013, at 2:36 PM, David Alves wrote: > > > >> Hi Jacques > >> > >> Is there any chance we could get a preview of this physical plan > stuff and basic plumbing for distributed execution before the weekend? > maybe in a github branch somewhere? > >> I mean it doesn't have to be complete or even running, I'd just like > to make some progress with other stuff and keeping it in line with > whichever plumbing you already have would be great. > >> > >> Best > >> David > >> > >> On Mar 13, 2013, at 3:12 PM, Jacques Nadeau wrote: > >> > >>> I'm working on some physical plan stuff as well as some basic plumbing > for > >>> distributed execution. Its very in progress so I need to clean things > up a > >>> bit before we could collaborate/ divide and conquer on it. Depending > on > >>> your timing and availability, maybe I could put some of this together > in > >>> the next couple days so that you could plug in rather than reinvent. > In > >>> the meantime, pushing forward the builder stuff, additional test cases > on > >>> the reference interpreter and/or thinking through the logical plan > storage > >>> engine pushdown/rewrite could be very useful. > >>> > >>> Let me know your thoughts. > >>> > >>> thanks, > >>> Jacques > >>> > >>> On Wed, Mar 13, 2013 at 9:47 AM, David Alves > wrote: > >>> > >>>> Hi Jacques > >>>> > >>>> I can assign issues to me now, thanks. > >>>> What you say wrt to the logical/physical/execution layers sounds > >>>> good. > >>>> My main concern, for the moment is to have something working as > >>>> fast as possible, i.e. some daemons that I'd be able to deploy to a > working > >>>> hbase cluster and send them work to do in some form (first step would > be to > >>>> treat is as a non distributed engine where each daemon runs an > instance of > >>>> the prototype). > >>>> Here's where I'd like to go next: > >>>> - lay the ground work for the daemons (scripts/rpc iface/wiring > >>>> protocol). > >>>> - create an execution engine iface that allows to abstract future > >>>> implementations, and make it available through the rpc iface. this > would > >>>> sit in front of the ref impl for now and would be replaced by cpp > down the > >>>> line. > >>>> > >>>> I think we can probably concentrate on the capabilities iface a > >>>> bit down the line but, as a first approach, I see it simply providing > a > >>>> simple set of ops that it is able to run internally. > >>>> How to abstract locality/partitioning/schema capabilities is till > >>>> not clear to me though, thoughts? > >>>> > >>>> David > >>>> > >>>> On Mar 13, 2013, at 11:12 AM, Jacques Nadeau > wrote: > >>>> > >>>>> I'm working on a presentation that will better illustrate the layers. > >>>>> There are actually three key plans. Thinking to date has been to > break > >>>>> the plans down into logical, physical and execution. The third > hasn't > >>>> been > >>>>> expressed well here and is entirely an internal domain to the > execution > >>>>> engine. Following some classic methods: Logical expresses what we > want > >>>> to > >>>>> do, Physical expresses how we want to do it (adding points of > >>>>> parallelization but not specifying particular amounts of > parallelization > >>>> or > >>>>> node by node assignments). The execution engine is then responsible > for > >>>>> determining the amount of parallelization of a particular plan along > with > >>>>> system load (likely leveraging Berkeley's Sparrow work), task > priority > >>>> and > >>>>> specific data locality information, building sub-dags to be assigned > to > >>>>> individual nodes and execute the plan. > >>>>> > >>>>> So in the higher logical and physical levels, a single Scan and > >>>> subsequent > >>>>> ScanPOP should be okay... (ScanROPs have a separate problems since > they > >>>>> ignore the level of separation we're planning for the real execution > >>>> layer. > >>>>> This is the why the current ref impl turns a single Scan into > potentially > >>>>> a union of ScanROPs... not elegant but logically correct.) > >>>>> > >>>>> The capabilities interface still needs to be defined for how a > storage > >>>>> engine reveals its logical capabilities and thus consumes part of the > >>>> plan. > >>>>> > >>>>> J > >>>>> > >>>>> > >>>>> On Tue, Mar 12, 2013 at 10:19 PM, David Alves > > >>>> wrote: > >>>>> > >>>>>> Hi Linsen > >>>>>> > >>>>>> Some of what you are saying like push down of ops like filter, > >>>>>> projection or partial aggregation below the storage engine scanner > >>>> level, > >>>>>> or sub tree execution are actively being discussed in issues > DRILL-13 > >>>>>> (Strorage Engine Interface) and DRILL-15 (Hbase storage engine), > your > >>>> input > >>>>>> in these issues is most welcome. > >>>>>> > >>>>>> HBase in particular has the notion of > >>>>>> enpoints/coprocessors/filters that allow pushing this down easily > (this > >>>> is > >>>>>> also in line with what other parallel database over nosql > >>>> implementations > >>>>>> like tajo do). > >>>>>> A possible approach is to have the optimizer change the order of > >>>>>> the ops to place them below the storage engine scanner and let the > SE > >>>> impl > >>>>>> deal with it internally. > >>>>>> > >>>>>> There are also some other pieces missing at the moment AFAIK, > >>>> like > >>>>>> a distributed metadata store, the drill daemons, wiring, etc. > >>>>>> > >>>>>> So in summary, you're absolutely right, and if you're > >>>> particularly > >>>>>> interested in the HBase SE impl (as I am, for the moment) I'd be > >>>> interested > >>>>>> in collaborating. > >>>>>> > >>>>>> Best > >>>>>> David > >>>>>> > >>>>>> > >>>>>> On Mar 12, 2013, at 11:44 PM, Lisen Mu wrote: > >>>>>> > >>>>>>> Hi David, > >>>>>>> > >>>>>>> Very nice to see your effort on this. > >>>>>>> > >>>>>>> Hi Jacques, > >>>>>>> > >>>>>>> we are also extending drill prototype, to see if there is any > chance to > >>>>>>> meet our production need. However, We find that implementing a > >>>> performant > >>>>>>> HBase storage engine is a not so straight-forward work, and > requires > >>>> some > >>>>>>> workaround. The problem is in Scan interface. > >>>>>>> > >>>>>>> In drill's physical plan model, ScanROP is in charge of table scan. > >>>>>> Storage > >>>>>>> engine provides output for a whole data source, a csv file for > example. > >>>>>>> It's sufficient for input source like plain file, but for hbase, > it's > >>>> not > >>>>>>> very efficient, if not impossible, to let ScanROP retrieve a whole > >>>> htable > >>>>>>> into drill. Storage engines like HBase should have some ablility > to do > >>>>>> part > >>>>>>> of the DrQL query, like Filter, if a filter can be performed by > >>>>>> specifying > >>>>>>> startRowKey and endRowKey. Storage engine like mysql could do more, > >>>> even > >>>>>>> Join. > >>>>>>> > >>>>>>> Generally, it would be more clear if a ScanROP is mapped to a > sub-DAG > >>>> of > >>>>>>> logical plan DAG instead of a single Scan node in logical plan. If > so, > >>>>>> more > >>>>>>> implementation-specific information would coupe into the plan > >>>>>> optimization > >>>>>>> & transformation phase. I guess that's the price to pay when > >>>> optimization > >>>>>>> comes, or is there other way I failed to see? > >>>>>>> > >>>>>>> Please correct me if anything is wrong. > >>>>>>> > >>>>>>> thanks, > >>>>>>> > >>>>>>> Lisen > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On Wed, Mar 13, 2013 at 9:33 AM, David Alves < > davidralves@gmail.com> > >>>>>> wrote: > >>>>>>> > >>>>>>>> Hi Jacques > >>>>>>>> > >>>>>>>> I've submitted a fist pass patch to DRILL-15. > >>>>>>>> I did this mostly because HBase will be my main target and > >>>>>> because > >>>>>>>> I wanted to get a feel of what would be a nice interface for > DRILL-13. > >>>>>> Have > >>>>>>>> some thoughts that I will post soon. > >>>>>>>> btw: I still can't assign issues to myself in JIRA, did you > >>>>>> forget > >>>>>>>> to add me as a contributor? > >>>>>>>> > >>>>>>>> Best > >>>>>>>> David > >>>>>>>> > >>>>>>>> On Mar 11, 2013, at 2:13 PM, Jacques Nadeau > >>>> wrote: > >>>>>>>> > >>>>>>>>> Hey David, > >>>>>>>>> > >>>>>>>>> These sound good. I've add you as a contributor on jira so you > can > >>>>>>>> assign > >>>>>>>>> tasks to yourself. I think 45 and 46 are good places to start. > 15 > >>>>>>>> depends > >>>>>>>>> on 13 and working on the two hand in hand would probably be a > good > >>>>>> idea. > >>>>>>>>> Maybe we could do a design discussion on 15 and 13 here once you > have > >>>>>>>> some > >>>>>>>>> time to focus on it. > >>>>>>>>> > >>>>>>>>> Jacques > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Mon, Mar 11, 2013 at 3:02 AM, David Alves < > davidralves@gmail.com> > >>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Hi All > >>>>>>>>>> > >>>>>>>>>> I have a new academic project for which I'd like to use drill > >>>>>>>>>> since none of the other parallel database over hadoop/nosql > >>>>>>>> implementations > >>>>>>>>>> fit just right. > >>>>>>>>>> To this goal I've been tinkering with the prototype trying to > >>>>>>>> find > >>>>>>>>>> where I'd be most useful. > >>>>>>>>>> > >>>>>>>>>> Here's where I'd like to start, if you agree: > >>>>>>>>>> - implement HBase storage engine (DRILL-15) > >>>>>>>>>> - start with simple scanning an push down of > >>>>>>>>>> selection/projection > >>>>>>>>>> - implement the LogicalPlanBuilder (DRILL-45) > >>>>>>>>>> - setup coding style in the wiki (formatting/imports etc, > >>>>>>>> DRILL-46) > >>>>>>>>>> - create builders for all logical plan elements/make logical > >>>>>>>> plans > >>>>>>>>>> immutable (no issue for this, I'd like to hear your thoughts > first). > >>>>>>>>>> > >>>>>>>>>> Please let me know your thoughts, and if you agree please > >>>> assign > >>>>>>>>>> the issues to me (it seems that I can't assign them myself). > >>>>>>>>>> > >>>>>>>>>> Best > >>>>>>>>>> David Alves > >> > > --bcaec519644b31845604d814e0a5--