[prev in list] [next in list] [prev in thread] [next in thread]
List: mondrian
Subject: [Mondrian] Blog post: Improved collections classes for Mondrian's
From: "Julian Hyde" <jhyde () pentaho ! com>
Date: 2010-02-17 2:37:40
Message-ID: A7361BCB43AA4B3DA547DC546AFA0EB7 () mackerel
[Download RAW message or body]
[Attachment #2 (multipart/alternative)]
I just posted an article "Improved collections classes for Mondrian's query
execution process" to my blog. I'd like to start a design discussion.
http://julianhyde.blogspot.com/2010/02/improved-collections-classes-for.html
Julian
Mondrian calculations work predominantly over lists of members and tuples.
Internally, Mondrian represents lists of members as List, and represents
lists of tuples as List. (And similarly for iterators over members and
tuples.)
There are two problems with this. First, the representation of tuple lists
requires an array to be allocated for each element of the list. Allocations
cost time and memory. (Granted, we could allocate temporary arrays only when
tuples are accessed, which would cost only time. But according to my latest
round of profiling, the effort of allocating lots small arrays is
significant.)
Second, the code to deal with members and tuples has to be different. The
most extreme example of this found in the implementation of CrossJoin. There
are over 30 inner classes in CrossJoinFunDef.java, to deal with the
permutations of iterator vs. list, mutable vs. immutable, and tuple vs.
member.
In short, the java standard List and Iterator classes are not serving us
well. I think it's appropriate to introduce classes/interfaces that handle
members and tuples more uniformly, and can store, access, and iterate over
collections without lots of small arrays being created.
Here are some collection classes that I think would serve the purpose:
interface TupleList {
int size();
int arity();
Member getMember(int index, int ordinal);
TupleIterator tupleIterator();
}
interface TupleIterator {
int arity();
boolean hasNext();
// writes the members of the next tuple into given array
void next(Member[] members);
// appends members of the next tuple to given list
void next(List<Member> members);
}
If arity = 1 (i.e. if the list is just a collection of members) then
TupleList could easily be implemented using java.util.ArrayList.
For other arities, a list of tuples could be represented as a set of members
end-to-end. For instance, the list with two 3-tuple elements {(A1, B1, C1),
(A2, B2, C2)} would be held in a list {A1, B1, C1, A2, B2, C2} and
getMember(index, ordinal) would read element index * arity + ordinal of the
list.
Introducing these would require quite a few code changes, mostly in the
mondrian.olap.fun package, which is where the builtin functions are
implemented. There should be no changes to the user API or olap4j.
I am still debating whether this change makes sense. Usually this kind of
penny-pinching architectural change doesn't pay off. But some of them pay
off big. I've learned in Oracle, Broadbase, and SQLstream that for
high-performance data processing you shouldn't be doing any memory
allocations in an inner loop that is executed once per row. That isn't quite
practical in Java, but it's a goal to strive for. In today's CPU
architectures, where memory is slow and last-level-cache is fast, it pays to
keep data contiguous.
If you are a Mondrian developer, I'd be interested to hear what you think
about this proposed change.
[Attachment #5 (text/html)]
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content="text/html; charset=us-ascii" http-equiv=Content-Type>
<META name=GENERATOR content="MSHTML 8.00.6001.18882"></HEAD>
<BODY>
<DIV><SPAN class=891143401-17022010><FONT color=#000080 size=2
face="Lucida Sans">I just posted an article "Improved collections classes for
Mondrian's query execution process" to my blog. I'd like to start a design
discussion.</FONT></SPAN></DIV>
<DIV><SPAN class=891143401-17022010><FONT color=#000080 size=2
face="Lucida Sans"></FONT></SPAN> </DIV>
<DIV><SPAN class=891143401-17022010><A
href="http://julianhyde.blogspot.com/2010/02/improved-collections-classes-for.html">ht \
tp://julianhyde.blogspot.com/2010/02/improved-collections-classes-for.html</A></SPAN></DIV>
<DIV><SPAN class=891143401-17022010><FONT color=#000080 size=2
face="Lucida Sans"></FONT></SPAN> </DIV>
<DIV><SPAN class=891143401-17022010><FONT color=#000080 size=2
face="Lucida Sans">Julian</FONT></SPAN></DIV>
<DIV><SPAN class=891143401-17022010><FONT color=#000080 size=2
face="Lucida Sans"></FONT></SPAN> </DIV>
<BLOCKQUOTE style="MARGIN-RIGHT: 0px" dir=ltr>
<DIV><SPAN class=891143401-17022010><SPAN
style="WIDOWS: 2; TEXT-TRANSFORM: none; TEXT-INDENT: 0px; BORDER-COLLAPSE: \
separate; FONT: medium 'Times New Roman'; WHITE-SPACE: normal; ORPHANS: 2; \
LETTER-SPACING: normal; COLOR: rgb(0,0,0); WORD-SPACING: 0px; \
-webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; \
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; \
-webkit-text-stroke-width: 0px" class=Apple-style-span><SPAN
style="TEXT-ALIGN: left; LINE-HEIGHT: 20px; FONT-FAMILY: 'Trebuchet MS', Trebuchet, \
Verdana, sans-serif; COLOR: rgb(204,204,204); FONT-SIZE: 13px" \
class=Apple-style-span><FONT color=#000000>Mondrian calculations work predominantly \
over lists of members and tuples. Internally, Mondrian represents lists of members \
as List<MEMBER>, and represents lists of tuples as List<MEMBER[]>. (And similarly \
for iterators over members and tuples.)<BR><BR>There are two problems with this. \
First, the representation of tuple lists requires an array to be allocated for each \
element of the list. Allocations cost time and memory. (Granted, we could allocate \
temporary arrays only when tuples are accessed, which would cost only time. But \
according to my latest round of profiling, the effort of allocating lots small \
arrays is significant.)<BR><BR>Second, the code to deal with members and tuples has \
to be different. The most extreme example of this found in the implementation of
CrossJoin. There are over 30 inner classes in CrossJoinFunDef.java, to deal
with the permutations of iterator vs. list, mutable vs. immutable, and tuple
vs. member.<BR><BR>In short, the java standard List and Iterator classes are
not serving us well. I think it's appropriate to introduce classes/interfaces
that handle members and tuples more uniformly, and can store, access, and
iterate over collections without lots of small arrays being
created.<BR><BR>Here are some collection classes that I think would serve the
purpose:<BR></FONT>
<BLOCKQUOTE style="LINE-HEIGHT: 1.3em; MARGIN: 1em 20px"><FONT
color=#000000>interface TupleList {<BR> int
size();<BR> int
arity();<BR> Member getMember(int index, int
ordinal);<BR> TupleIterator
tupleIterator();<BR>}<BR><BR>interface TupleIterator
{<BR> int arity();<BR> boolean
hasNext();<BR> // writes the members of the next
tuple into given array<BR> void next(Member[]
members);<BR> // appends members of the next tuple to
given list<BR> void next(List<Member>
members);<BR>}</FONT></BLOCKQUOTE><FONT color=#000000>If arity = 1 (i.e. if
the list is just a collection of members) then TupleList could easily be
implemented using java.util.ArrayList.<BR><BR>For other arities, a list of
tuples could be represented as a set of members end-to-end. For instance, the
list with two 3-tuple elements {(A1, B1, C1), (A2, B2, C2)} would be held in a
list {A1, B1, C1, A2, B2, C2} and getMember(index, ordinal) would read element
index * arity + ordinal of the list.<BR><BR>Introducing these would require
quite a few code changes, mostly in the mondrian.olap.fun package, which is
where the builtin functions are implemented. There should be no changes to the
user API or olap4j.<BR><BR>I am still debating whether this change makes
sense. Usually this kind of penny-pinching architectural change doesn't pay
off. But some of them pay off big. I've learned in Oracle, Broadbase, and
SQLstream that for high-performance data processing you shouldn't be doing any
memory allocations in an inner loop that is executed once per row. That isn't
quite practical in Java, but it's a goal to strive for. In today's CPU
architectures, where memory is slow and last-level-cache is fast, it pays to
keep data contiguous.<BR><BR>If you are a Mondrian developer, I'd be
interested to hear what you think about this proposed
change.</MEMBER[]></MEMBER></FONT></SPAN></SPAN></SPAN></DIV></BLOCKQUOTE></BODY></HTML>
_______________________________________________
Mondrian mailing list
Mondrian@pentaho.org
http://lists.pentaho.org/mailman/listinfo/mondrian
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic