[prev in list] [next in list] [prev in thread] [next in thread] 

List:       drill-dev
Subject:    Re: contribution
From:       David Alves <davidralves () gmail ! com>
Date:       2013-03-22 19:06:34
Message-ID: 98049444-11FF-45E3-BD84-46C15FA068EE () gmail ! com
[Download RAW message or body]

Hey Jacques

	Sorry to be a nag, but is there any change to take a sneak peak at the \
protobuf rpc stuff?  I'd really like hack something together wrt to the \
daemon this weekend.  Also, wrt to configuration management (zk/helix) \
maybe you could post the iface so that it'd be possible to hack something \
static (i.e. non-ft, properties file based) just to make dist execution \
work.

Thanks
David

On Mar 16, 2013, at 8:34 PM, Jacques Nadeau <jacques@apache.org> wrote:

> Hey David,
> 
> The java-exec framework is not far enough along that it makes sense for \
> me to push it externally yet.  However, I did push my initial wip \
> physical plan approach.  You can find it here:
> https://github.com/jacques-n/incubator-drill/tree/physical_plan_updates
> 
> Hopefully, I will get further along on the java-exec stuff soon.
> 
> I'd suggest that you focus your energy on the StorageEngine API and HBase
> implementation.  If you're up for it, let's do a quick skype chat to sync
> up.  Let me know your availability over the next few days.
> 
> Thanks,
> Jacques
> 
> 
> 
> On Fri, Mar 15, 2013 at 6:59 PM, David Alves <davidralves@gmail.com> \
> wrote: 
> > that'd be great thanks.
> > 
> > -david
> > 
> > On Mar 15, 2013, at 8:51 PM, Jacques Nadeau <jacques.drill@gmail.com>
> > wrote:
> > 
> > > I've been under the weather the last few days and haven't made much
> > > progress. Let me see if I can get you something tomorrow.
> > > 
> > > On Mar 15, 2013, at 2:36 PM, David Alves <davidralves@gmail.com> \
> > > wrote: 
> > > > Hi Jacques
> > > > 
> > > > Is there any chance we could get a preview of this physical plan
> > stuff and basic plumbing for distributed execution before the weekend?
> > maybe in a github branch somewhere?
> > > > I mean it doesn't have to be complete or even running, I'd just \
> > > > like
> > to make some progress with other stuff and keeping it in line with
> > whichever plumbing you already have would be great.
> > > > 
> > > > Best
> > > > David
> > > > 
> > > > On Mar 13, 2013, at 3:12 PM, Jacques Nadeau <jacques@apache.org> \
> > > > wrote: 
> > > > > I'm working on some physical plan stuff as well as some basic \
> > > > > plumbing
> > for
> > > > > distributed execution.  Its very in progress so I need to clean \
> > > > > things
> > up a
> > > > > bit before we could collaborate/ divide and conquer on it.  \
> > > > > Depending
> > on
> > > > > your timing and availability, maybe I could put some of this \
> > > > > together
> > in
> > > > > the next couple days so that you could plug in rather than \
> > > > > reinvent.
> > In
> > > > > the meantime, pushing forward the builder stuff, additional test \
> > > > > cases
> > on
> > > > > the reference interpreter and/or thinking through the logical \
> > > > > plan
> > storage
> > > > > engine pushdown/rewrite could be very useful.
> > > > > 
> > > > > Let me know your thoughts.
> > > > > 
> > > > > thanks,
> > > > > Jacques
> > > > > 
> > > > > On Wed, Mar 13, 2013 at 9:47 AM, David Alves \
> > > > > <davidralves@gmail.com>
> > wrote:
> > > > > 
> > > > > > Hi Jacques
> > > > > > 
> > > > > > I can assign issues to me now, thanks.
> > > > > > What you say wrt to the logical/physical/execution layers \
> > > > > > sounds good.
> > > > > > My main concern, for the moment is to have something working as
> > > > > > fast as possible, i.e. some daemons that I'd be able to deploy \
> > > > > > to a
> > working
> > > > > > hbase cluster and send them work to do in some form (first step \
> > > > > > would
> > be to
> > > > > > treat is as a non distributed engine where each daemon runs an
> > instance of
> > > > > > the prototype).
> > > > > > Here's where I'd like to go next:
> > > > > > - lay the ground work for the daemons (scripts/rpc iface/wiring
> > > > > > protocol).
> > > > > > - create an execution engine iface that allows to abstract \
> > > > > > future implementations, and make it available through the rpc \
> > > > > > iface. this
> > would
> > > > > > sit in front of the ref impl for now and would be replaced by \
> > > > > > cpp
> > down the
> > > > > > line.
> > > > > > 
> > > > > > I think we can probably concentrate on the capabilities iface a
> > > > > > bit down the line but, as a first approach, I see it simply \
> > > > > > providing
> > a
> > > > > > simple set of ops that it is able to run internally.
> > > > > > How to abstract locality/partitioning/schema capabilities is \
> > > > > > till not clear to me though, thoughts?
> > > > > > 
> > > > > > David
> > > > > > 
> > > > > > On Mar 13, 2013, at 11:12 AM, Jacques Nadeau \
> > > > > > <jacques@apache.org>
> > wrote:
> > > > > > 
> > > > > > > I'm working on a presentation that will better illustrate the \
> > > > > > > layers. There are actually three key plans.  Thinking to date \
> > > > > > > has been to
> > break
> > > > > > > the plans down into logical, physical and execution.  The \
> > > > > > > third
> > hasn't
> > > > > > been
> > > > > > > expressed well here and is entirely an internal domain to the
> > execution
> > > > > > > engine.  Following some classic methods: Logical expresses \
> > > > > > > what we
> > want
> > > > > > to
> > > > > > > do, Physical expresses how we want to do it (adding points of
> > > > > > > parallelization but not specifying particular amounts of
> > parallelization
> > > > > > or
> > > > > > > node by node assignments).  The execution engine is then \
> > > > > > > responsible
> > for
> > > > > > > determining the amount of parallelization of a particular \
> > > > > > > plan along
> > with
> > > > > > > system load (likely leveraging Berkeley's Sparrow work), task
> > priority
> > > > > > and
> > > > > > > specific data locality information, building sub-dags to be \
> > > > > > > assigned
> > to
> > > > > > > individual nodes and execute the plan.
> > > > > > > 
> > > > > > > So in the higher logical and physical levels, a single Scan \
> > > > > > > and
> > > > > > subsequent
> > > > > > > ScanPOP should be okay...  (ScanROPs have a separate problems \
> > > > > > > since
> > they
> > > > > > > ignore the level of separation we're planning for the real \
> > > > > > > execution
> > > > > > layer.
> > > > > > > This is the why the current ref impl turns a single Scan into
> > potentially
> > > > > > > a union of ScanROPs... not elegant but logically correct.)
> > > > > > > 
> > > > > > > The capabilities interface still needs to be defined for how \
> > > > > > > a
> > storage
> > > > > > > engine reveals its logical capabilities and thus consumes \
> > > > > > > part of the
> > > > > > plan.
> > > > > > > 
> > > > > > > J
> > > > > > > 
> > > > > > > 
> > > > > > > On Tue, Mar 12, 2013 at 10:19 PM, David Alves \
> > > > > > > <davidralves@gmail.com
> > > 
> > > > > > wrote:
> > > > > > > 
> > > > > > > > Hi Linsen
> > > > > > > > 
> > > > > > > > Some of what you are saying like push down of ops like \
> > > > > > > > filter, projection or partial aggregation below the storage \
> > > > > > > > engine scanner
> > > > > > level,
> > > > > > > > or sub tree execution are actively being discussed in \
> > > > > > > > issues
> > DRILL-13
> > > > > > > > (Strorage Engine Interface) and DRILL-15 (Hbase storage \
> > > > > > > > engine),
> > your
> > > > > > input
> > > > > > > > in these issues is most welcome.
> > > > > > > > 
> > > > > > > > HBase in particular has the notion of
> > > > > > > > enpoints/coprocessors/filters that allow pushing this down \
> > > > > > > > easily
> > (this
> > > > > > is
> > > > > > > > also in line with what other parallel database over nosql
> > > > > > implementations
> > > > > > > > like tajo do).
> > > > > > > > A possible approach is to have the optimizer change the \
> > > > > > > > order of the ops to place them below the storage engine \
> > > > > > > > scanner and let the
> > SE
> > > > > > impl
> > > > > > > > deal with it internally.
> > > > > > > > 
> > > > > > > > There are also some other pieces missing at the moment \
> > > > > > > > AFAIK,
> > > > > > like
> > > > > > > > a distributed metadata store, the drill daemons, wiring, \
> > > > > > > > etc. 
> > > > > > > > So in summary, you're absolutely right, and if you're
> > > > > > particularly
> > > > > > > > interested in the HBase SE impl (as I am, for the moment) \
> > > > > > > > I'd be
> > > > > > interested
> > > > > > > > in collaborating.
> > > > > > > > 
> > > > > > > > Best
> > > > > > > > David
> > > > > > > > 
> > > > > > > > 
> > > > > > > > On Mar 12, 2013, at 11:44 PM, Lisen Mu <immars@gmail.com> \
> > > > > > > > wrote: 
> > > > > > > > > Hi David,
> > > > > > > > > 
> > > > > > > > > Very nice to see your effort on this.
> > > > > > > > > 
> > > > > > > > > Hi Jacques,
> > > > > > > > > 
> > > > > > > > > we are also extending drill prototype, to see if there is \
> > > > > > > > > any
> > chance to
> > > > > > > > > meet our production need. However, We find that \
> > > > > > > > > implementing a
> > > > > > performant
> > > > > > > > > HBase storage engine is a not so straight-forward work, \
> > > > > > > > > and
> > requires
> > > > > > some
> > > > > > > > > workaround. The problem is in Scan interface.
> > > > > > > > > 
> > > > > > > > > In drill's physical plan model, ScanROP is in charge of \
> > > > > > > > > table scan.
> > > > > > > > Storage
> > > > > > > > > engine provides output for a whole data source, a csv \
> > > > > > > > > file for
> > example.
> > > > > > > > > It's sufficient for input source like plain file, but for \
> > > > > > > > > hbase,
> > it's
> > > > > > not
> > > > > > > > > very efficient, if not impossible, to let ScanROP \
> > > > > > > > > retrieve a whole
> > > > > > htable
> > > > > > > > > into drill. Storage engines like HBase should have some \
> > > > > > > > > ablility
> > to do
> > > > > > > > part
> > > > > > > > > of the DrQL query, like Filter, if a filter can be \
> > > > > > > > > performed by
> > > > > > > > specifying
> > > > > > > > > startRowKey and endRowKey. Storage engine like mysql \
> > > > > > > > > could do more,
> > > > > > even
> > > > > > > > > Join.
> > > > > > > > > 
> > > > > > > > > Generally, it would be more clear if a ScanROP is mapped \
> > > > > > > > > to a
> > sub-DAG
> > > > > > of
> > > > > > > > > logical plan DAG instead of a single Scan node in logical \
> > > > > > > > > plan. If
> > so,
> > > > > > > > more
> > > > > > > > > implementation-specific information would coupe into the \
> > > > > > > > > plan
> > > > > > > > optimization
> > > > > > > > > & transformation phase. I guess that's the price to pay \
> > > > > > > > > when
> > > > > > optimization
> > > > > > > > > comes, or is there other way I failed to see?
> > > > > > > > > 
> > > > > > > > > Please correct me if anything is wrong.
> > > > > > > > > 
> > > > > > > > > thanks,
> > > > > > > > > 
> > > > > > > > > Lisen
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > On Wed, Mar 13, 2013 at 9:33 AM, David Alves <
> > davidralves@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > 
> > > > > > > > > > Hi Jacques
> > > > > > > > > > 
> > > > > > > > > > I've submitted a fist pass patch to DRILL-15.
> > > > > > > > > > I did this mostly because HBase will be my main target \
> > > > > > > > > > and
> > > > > > > > because
> > > > > > > > > > I wanted to get a feel of what would be a nice \
> > > > > > > > > > interface for
> > DRILL-13.
> > > > > > > > Have
> > > > > > > > > > some thoughts that I will post soon.
> > > > > > > > > > btw: I still can't assign issues to myself in JIRA, did \
> > > > > > > > > > you
> > > > > > > > forget
> > > > > > > > > > to add me as a contributor?
> > > > > > > > > > 
> > > > > > > > > > Best
> > > > > > > > > > David
> > > > > > > > > > 
> > > > > > > > > > On Mar 11, 2013, at 2:13 PM, Jacques Nadeau \
> > > > > > > > > > <jacques@apache.org>
> > > > > > wrote:
> > > > > > > > > > 
> > > > > > > > > > > Hey David,
> > > > > > > > > > > 
> > > > > > > > > > > These sound good.  I've add you as a contributor on \
> > > > > > > > > > > jira so you
> > can
> > > > > > > > > > assign
> > > > > > > > > > > tasks to yourself.  I think 45 and 46 are good places \
> > > > > > > > > > > to start.
> > 15
> > > > > > > > > > depends
> > > > > > > > > > > on 13 and working on the two hand in hand would \
> > > > > > > > > > > probably be a
> > good
> > > > > > > > idea.
> > > > > > > > > > > Maybe we could do a design discussion on 15 and 13 \
> > > > > > > > > > > here once you
> > have
> > > > > > > > > > some
> > > > > > > > > > > time to focus on it.
> > > > > > > > > > > 
> > > > > > > > > > > Jacques
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > On Mon, Mar 11, 2013 at 3:02 AM, David Alves <
> > davidralves@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > > 
> > > > > > > > > > > > Hi All
> > > > > > > > > > > > 
> > > > > > > > > > > > I have a new academic project for which I'd like to \
> > > > > > > > > > > > use drill since none of the other parallel database \
> > > > > > > > > > > > over hadoop/nosql
> > > > > > > > > > implementations
> > > > > > > > > > > > fit just right.
> > > > > > > > > > > > To this goal I've been tinkering with the prototype \
> > > > > > > > > > > > trying to
> > > > > > > > > > find
> > > > > > > > > > > > where I'd be most useful.
> > > > > > > > > > > > 
> > > > > > > > > > > > Here's where I'd like to start, if you agree:
> > > > > > > > > > > > - implement HBase storage engine (DRILL-15)
> > > > > > > > > > > > - start with simple scanning an push down of
> > > > > > > > > > > > selection/projection
> > > > > > > > > > > > - implement the LogicalPlanBuilder (DRILL-45)
> > > > > > > > > > > > - setup coding style in the wiki \
> > > > > > > > > > > > (formatting/imports etc,
> > > > > > > > > > DRILL-46)
> > > > > > > > > > > > - create builders for all logical plan \
> > > > > > > > > > > > elements/make logical
> > > > > > > > > > plans
> > > > > > > > > > > > immutable (no issue for this, I'd like to hear your \
> > > > > > > > > > > > thoughts
> > first).
> > > > > > > > > > > > 
> > > > > > > > > > > > Please let me know your thoughts, and if you agree \
> > > > > > > > > > > > please
> > > > > > assign
> > > > > > > > > > > > the issues to me (it seems that I can't assign them \
> > > > > > > > > > > > myself). 
> > > > > > > > > > > > Best
> > > > > > > > > > > > David Alves
> > > > 
> > 
> > 


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic