Hi Jacques

	I can assign issues to me now, thanks.
	What you say wrt to the logical/physical/execution layers sounds =
good.
	My main concern, for the moment is to have something working as =
fast as possible, i.e. some daemons that I'd be able to deploy to a =
working hbase cluster and send them work to do in some form (first step =
would be to treat is as a non distributed engine where each daemon runs =
an instance of the prototype).
	Here's where I'd like to go next:
	- lay the ground work for the daemons (scripts/rpc iface/wiring =
protocol).
	- create an execution engine iface that allows to abstract =
future implementations, and make it available through the rpc iface. =
this would sit in front of the ref impl for now and would be replaced by =
cpp down the line.
=09
	I think we can probably concentrate on the capabilities iface a =
bit down the line but, as a first approach, I see it simply providing a =
simple set of ops that it is able to run internally.=20
	How to abstract locality/partitioning/schema capabilities is =
till not clear to me though, thoughts?

David

On Mar 13, 2013, at 11:12 AM, Jacques Nadeau <jacques@apache.org> wrote:

> I'm working on a presentation that will better illustrate the layers.
> There are actually three key plans.  Thinking to date has been to =
break
> the plans down into logical, physical and execution.  The third hasn't =
been
> expressed well here and is entirely an internal domain to the =
execution
> engine.  Following some classic methods: Logical expresses what we =
want to
> do, Physical expresses how we want to do it (adding points of
> parallelization but not specifying particular amounts of =
parallelization or
> node by node assignments).  The execution engine is then responsible =
for
> determining the amount of parallelization of a particular plan along =
with
> system load (likely leveraging Berkeley's Sparrow work), task priority =
and
> specific data locality information, building sub-dags to be assigned =
to
> individual nodes and execute the plan.
>=20
> So in the higher logical and physical levels, a single Scan and =
subsequent
> ScanPOP should be okay...  (ScanROPs have a separate problems since =
they
> ignore the level of separation we're planning for the real execution =
layer.
> This is the why the current ref impl turns a single Scan into =
potentially
> a union of ScanROPs... not elegant but logically correct.)
>=20
> The capabilities interface still needs to be defined for how a storage
> engine reveals its logical capabilities and thus consumes part of the =
plan.
>=20
> J
>=20
>=20
> On Tue, Mar 12, 2013 at 10:19 PM, David Alves <davidralves@gmail.com> =
wrote:
>=20
>> Hi Linsen
>>=20
>>        Some of what you are saying like push down of ops like filter,
>> projection or partial aggregation below the storage engine scanner =
level,
>> or sub tree execution are actively being discussed in issues DRILL-13
>> (Strorage Engine Interface) and DRILL-15 (Hbase storage engine), your =
input
>> in these issues is most welcome.
>>=20
>>        HBase in particular has the notion of
>> enpoints/coprocessors/filters that allow pushing this down easily =
(this is
>> also in line with what other parallel database over nosql =
implementations
>> like tajo do).
>>        A possible approach is to have the optimizer change the order =
of
>> the ops to place them below the storage engine scanner and let the SE =
impl
>> deal with it internally.
>>=20
>>        There are also some other pieces missing at the moment AFAIK, =
like
>> a distributed metadata store, the drill daemons, wiring, etc.
>>=20
>>        So in summary, you're absolutely right, and if you're =
particularly
>> interested in the HBase SE impl (as I am, for the moment) I'd be =
interested
>> in collaborating.
>>=20
>> Best
>> David
>>=20
>>=20
>> On Mar 12, 2013, at 11:44 PM, Lisen Mu <immars@gmail.com> wrote:
>>=20
>>> Hi David,
>>>=20
>>> Very nice to see your effort on this.
>>>=20
>>> Hi Jacques,
>>>=20
>>> we are also extending drill prototype, to see if there is any chance =
to
>>> meet our production need. However, We find that implementing a =
performant
>>> HBase storage engine is a not so straight-forward work, and requires =
some
>>> workaround. The problem is in Scan interface.
>>>=20
>>> In drill's physical plan model, ScanROP is in charge of table scan.
>> Storage
>>> engine provides output for a whole data source, a csv file for =
example.
>>> It's sufficient for input source like plain file, but for hbase, =
it's not
>>> very efficient, if not impossible, to let ScanROP retrieve a whole =
htable
>>> into drill. Storage engines like HBase should have some ablility to =
do
>> part
>>> of the DrQL query, like Filter, if a filter can be performed by
>> specifying
>>> startRowKey and endRowKey. Storage engine like mysql could do more, =
even
>>> Join.
>>>=20
>>> Generally, it would be more clear if a ScanROP is mapped to a =
sub-DAG of
>>> logical plan DAG instead of a single Scan node in logical plan. If =
so,
>> more
>>> implementation-specific information would coupe into the plan
>> optimization
>>> & transformation phase. I guess that's the price to pay when =
optimization
>>> comes, or is there other way I failed to see?
>>>=20
>>> Please correct me if anything is wrong.
>>>=20
>>> thanks,
>>>=20
>>> Lisen
>>>=20
>>>=20
>>>=20
>>> On Wed, Mar 13, 2013 at 9:33 AM, David Alves <davidralves@gmail.com>
>> wrote:
>>>=20
>>>> Hi Jacques
>>>>=20
>>>>       I've submitted a fist pass patch to DRILL-15.
>>>>       I did this mostly because HBase will be my main target and
>> because
>>>> I wanted to get a feel of what would be a nice interface for =
DRILL-13.
>> Have
>>>> some thoughts that I will post soon.
>>>>       btw: I still can't assign issues to myself in JIRA, did you
>> forget
>>>> to add me as a contributor?
>>>>=20
>>>> Best
>>>> David
>>>>=20
>>>> On Mar 11, 2013, at 2:13 PM, Jacques Nadeau <jacques@apache.org> =
wrote:
>>>>=20
>>>>> Hey David,
>>>>>=20
>>>>> These sound good.  I've add you as a contributor on jira so you =
can
>>>> assign
>>>>> tasks to yourself.  I think 45 and 46 are good places to start.  =
15
>>>> depends
>>>>> on 13 and working on the two hand in hand would probably be a good
>> idea.
>>>>> Maybe we could do a design discussion on 15 and 13 here once you =
have
>>>> some
>>>>> time to focus on it.
>>>>>=20
>>>>> Jacques
>>>>>=20
>>>>>=20
>>>>> On Mon, Mar 11, 2013 at 3:02 AM, David Alves =
<davidralves@gmail.com>
>>>> wrote:
>>>>>=20
>>>>>> Hi All
>>>>>>=20
>>>>>>      I have a new academic project for which I'd like to use =
drill
>>>>>> since none of the other parallel database over hadoop/nosql
>>>> implementations
>>>>>> fit just right.
>>>>>>      To this goal I've been tinkering with the prototype trying =
to
>>>> find
>>>>>> where I'd be most useful.
>>>>>>=20
>>>>>>      Here's where I'd like to start, if you agree:
>>>>>>      - implement HBase storage engine (DRILL-15)
>>>>>>              - start with simple scanning an push down of
>>>>>> selection/projection
>>>>>>      - implement the LogicalPlanBuilder (DRILL-45)
>>>>>>      - setup coding style in the wiki (formatting/imports etc,
>>>> DRILL-46)
>>>>>>      - create builders for all logical plan elements/make logical
>>>> plans
>>>>>> immutable (no issue for this, I'd like to hear your thoughts =
first).
>>>>>>=20
>>>>>>      Please let me know your thoughts, and if you agree please =
assign
>>>>>> the issues to me (it seems that I can't assign them myself).
>>>>>>=20
>>>>>> Best
>>>>>> David Alves
>>>>=20
>>>>=20
>>=20
>>=20