[prev in list] [next in list] [prev in thread] [next in thread] 

List:       mondrian
Subject:    Re: [Mondrian] Non Empty not respected with aggregated tables when sets	are used
From:       Diethard Steiner <diethard.steiner () gmail ! com>
Date:       2014-10-03 18:06:56
Message-ID: 02577833-353D-44B5-B75D-D84EF6807B18 () gmail ! com
[Download RAW message or body]

Thanks Matt!

For the reference, this is the jira case I created:
http://jira.pentaho.com/browse/MONDRIAN-2215

On 3 Oct 2014, at 18:39, Matt Campbell <mcampbell@pentaho.com> wrote:

> Agreed, you should see consistent results in both cases.
> 
> Glad flipping highCardinality gives expected results, but I don't have an immediate \
> guess about exactly why there's a difference.  Go ahead and put in the Jira case. 
> Thanks!
> 
> -----Original Message-----
> From: mondrian-bounces@pentaho.org [mailto:mondrian-bounces@pentaho.org] On Behalf \
>                 Of Diethard Steiner
> Sent: Friday, October 03, 2014 1:20 PM
> To: Mondrian developer mailing list
> Subject: Re: [Mondrian] Non Empty not respected with aggregated tables when sets \
> are used 
> Nope, they are gone now.
> 
> But in any case, I would say that the result set should be the same no matter if I \
> set highCardinality to true or false, meaning NON EMPTY should be respected in both \
> cases. I can create a jira case for this if you want. 
> On 3 Oct 2014, at 18:16, Matt Campbell <mcampbell@pentaho.com> wrote:
> 
> > 
> > Do you still see the large number of segment load queries (one per month)?
> > 
> > 
> > -----Original Message-----
> > From: mondrian-bounces@pentaho.org 
> > [mailto:mondrian-bounces@pentaho.org] On Behalf Of Diethard Steiner
> > Sent: Friday, October 03, 2014 12:40 PM
> > To: Mondrian developer mailing list
> > Subject: Re: [Mondrian] Non Empty not respected with aggregated tables 
> > when sets are used
> > 
> > Hi Matt,
> > 
> > Thanks for the reply!  
> > 
> > Now this is very interesting: setting highCardinality to false yields the \
> > expected result. Thanks a lot for this hint! 
> > Should I log a jira case for this? 
> > 
> > The Mondrian version used is: 3.6.5
> > Saiku: biserver plugin, Paul's fork 
> > (http://ci.analytical-labs.com/job/pstoellberger-saiku3/lastSuccessful
> > Build/artifact/saiku-bi-platform-plugin/target/saiku-plugin-3.0-PSTOEL
> > LBERGER-SNAPSHOT.zip)
> > 
> > I have one global date dimension defined, and then in the individual cubes I \
> > reference it like this: <DimensionUsage source="Date" name="Start Date" \
> > caption="Start Date"  visible="true" foreignKey="startdate_tk" \
> > highCardinality="false"/> 
> > In another other cubes, which do not have agg tables, I did not see the original \
> > problem. 
> > Cheers,
> > Diethard
> > 
> > 
> > On 3 Oct 2014, at 16:53, Matt Campbell <mcampbell@pentaho.com> wrote:
> > 
> > > 
> > > Hi Diethard, greetings from rainy Michigan.
> > > 
> > > I notice your <Dimension> below is named [Date], but the MDX shows [Start \
> > > Date].  There also seems to be an inconsistency in the logged output.  Is there \
> > > more than one Date dimension in your model, or did it just get renamed \
> > > somewhere along the way? 
> > > I'm surprised both that you see empty rows for the first query, and also that \
> > > you see a segment load query per Month.  I would expect a single segment load \
> > > query against the agg table. 
> > > Which Saiku build are you seeing this with?  I'll see which version of Mondrian \
> > > it's using and try to reproduce. 
> > > Out of curiosity, I wonder if you would see the same behavior if you set \
> > > highCardinality="false" on the dim. 
> > > -matt
> > > 
> > > -----Original Message-----
> > > From: mondrian-bounces@pentaho.org
> > > [mailto:mondrian-bounces@pentaho.org] On Behalf Of Diethard Steiner
> > > Sent: Friday, October 03, 2014 10:44 AM
> > > To: Mondrian developer mailing list
> > > Subject: [Mondrian] Non Empty not respected with aggregated tables 
> > > when sets are used
> > > 
> > > Good afternoon everyone from a sunny London! 
> > > 
> > > I came across an interesting problem today:
> > > 
> > > There seems to be an issue in regards to handling NON Empty in one of the \
> > > latest Saiku builds with Mondrian 3.x (well, I haven't tested this with \
> > > anything else yet). The problem manifests itself only when using aggregated \
> > > tables. 
> > > The Schema:
> > > 
> > > Global Date Dim:
> > > 
> > > <Dimension name="Date" type="TimeDimension" visible="true" \
> > > highCardinality="true"> <Hierarchy name="Date" visible="true" hasAll="true" \
> > > allMemberName="Total" allMemberCaption="Total" primaryKey="date_tk"> <Table \
> > > name="dim_date" schema="common"/> <Level name="Date" column="date_digits" \
> > > type="Date" uniqueMembers="true" levelType="TimeDays" hideMemberIf="Never" \
> > > visible="true" /> </Hierarchy> 
> > > <Hierarchy name="Monthly Calendar" caption="Monthly Calendar" visible="true" \
> > > hasAll="true" allMemberName="Total" allMemberCaption="Total" \
> > > primaryKey="date_tk"> <Table name="dim_date" schema="common"/>
> > > <Level 
> > > name="Quarter" 
> > > column="year_quarter_int"
> > > type="Integer"  
> > > captionColumn="year_quarter"
> > > nameColumn="quarter_number"
> > > ordinalColumn="year_quarter_int"  
> > > uniqueMembers="false" levelType="TimeQuarters" hideMemberIf="Never" \
> > > visible="true" /> <Level name="Month" column="year_month_int" type="Integer" \
> > > nameColumn="year_month" ordinalColumn="year_month_int" uniqueMembers="false" \
> > > levelType="TimeMonths" hideMemberIf="Never" visible="true" /> <Level name="Day \
> > > of the Month" column="date_tk" type="Integer" nameColumn="day_in_month" \
> > > ordinalColumn="date_tk" uniqueMembers="true" levelType="TimeDays" \
> > > hideMemberIf="Never" visible="true" /> </Hierarchy>	
> > > <Hierarchy name="Weekly Calendar" caption="Weekly Calendar" visible="true" \
> > > hasAll="true" allMemberName="Total" allMemberCaption="Total" \
> > > primaryKey="date_tk"> <Table name="dim_date" schema="common"/>
> > > <Level name="Year" column="year_week_based" type="Integer" \
> > > uniqueMembers="false" levelType="TimeYears" hideMemberIf="Never" visible="true" \
> > > /> <Level name="Week" column="year_week_int" type="String" \
> > > nameColumn="year_week"  ordinalColumn="year_week_int" uniqueMembers="false" \
> > > levelType="TimeWeeks" hideMemberIf="Never" visible="true" /> <Level name="Day \
> > > of the Week" column="date_tk" type="Integer" nameColumn="day_in_week" \
> > > ordinalColumn="date_tk" uniqueMembers="true" levelType="TimeDays" \
> > > hideMemberIf="Never" visible="true" /> </Hierarchy>        
> > > </Dimension>
> > > 
> > > The Cube has the monthly aggr table referenced like this:
> > > 
> > > <AggName name="agg_monthly_fact_cdr" ignorecase="true">
> > > <AggFactCount column="fact_count">
> > > </AggFactCount>
> > > <AggForeignKey factColumn="service_tk" aggColumn="service_tk">
> > > </AggForeignKey>
> > > 	[ ... many more ... ]
> > > <AggMeasure column="calls" name="[Measures].[CDR Count]">
> > > </AggMeasure>
> > > 	[ ... many more ... ]
> > > <AggLevel column="startdate_year" name="[Start Date.Monthly Calendar].[Year]" \
> > > collapsed="true"></AggLevel> <AggLevel column="startdate_quarter_number" \
> > > name="[Start Date.Monthly Calendar].[Quarter]" collapsed="true"></AggLevel><!- \
> > > holds year_quarter_int value -> <AggLevel column="startdate_month_number" \
> > > name="[Start Date.Monthly Calendar].[Month]" collapsed="true"></AggLevel><!- \
> > > holds year_month_int value -> </AggName>
> > > 
> > > EXAMPLE
> > > ========
> > > 
> > > Both cases below should in theory return the same result set - the query was \
> > > slightly changed for each case. 
> > > CASE 1
> > > -----------
> > > 
> > > Saiku creates following MDX query:
> > > WITH
> > > SET [~ROWS] AS
> > > {[Start Date.Monthly Calendar].[Month].Members} SELECT NON EMPTY 
> > > {[Measures].[CDR Count]} ON COLUMNS, NON EMPTY [~ROWS] ON ROWS FROM 
> > > [CDR]
> > > 
> > > 
> > > Mondrian log:
> > > 
> > > 2014-10-03 15:13:45,894 DEBUG [mondrian.mdx] 4745: with set [~ROWS] as '{[Start \
> > > Date.Monthly Calendar].[Month].Members}' select NON EMPTY {[Measures].[CDR \
> > > Count]} ON COLUMNS,  NON EMPTY  [~ROWS] ON ROWS from [CDR]
> > > 
> > > 2014-10-03 15:13:45,939 DEBUG [mondrian.sql] 325: 
> > > HighCardSqlTupleReader.readTuples [[Date.Monthly Calendar].[Month]]: 
> > > executing sql [select "dim_date"."year4" as "c0", 
> > > "dim_date"."year_quarter_int" as "c1", "dim_date"."year_quarter" as 
> > > "c2", "dim_date"."quarter_number" as "c3", "dim_date"."year_month_int"
> > > as "c4", "dim_date"."year_month" as "c5" from "common"."dim_date" as 
> > > "dim_date" group by "dim_date"."year4", 
> > > "dim_date"."year_quarter_int", "dim_date"."year_quarter", 
> > > "dim_date"."quarter_number", "dim_date"."year_month_int", 
> > > "dim_date"."year_month" order by "dim_date"."year4" ASC NULLS LAST, 
> > > "dim_date"."year_quarter_int" ASC NULLS LAST, 
> > > "dim_date"."year_month_int" ASC NULLS LAST]
> > > 2014-10-03 15:13:46,044 DEBUG [mondrian.sql] 325: , exec 104 ms
> > > 2014-10-03 15:13:46,053 DEBUG [mondrian.sql] 325: , exec+fetch 114 
> > > ms,
> > > 241 rows
> > > 2014-10-03 15:13:46,096 DEBUG [mondrian.sql] 326: 
> > > HighCardSqlTupleReader.readTuples [[Date.Monthly Calendar].[Month]]: 
> > > executing sql [select "dim_date"."year4" as "c0", 
> > > "dim_date"."year_quarter_int" as "c1", "dim_date"."year_quarter" as 
> > > "c2", "dim_date"."quarter_number" as "c3", "dim_date"."year_month_int"
> > > as "c4", "dim_date"."year_month" as "c5" from "common"."dim_date" as 
> > > "dim_date" group by "dim_date"."year4", 
> > > "dim_date"."year_quarter_int", "dim_date"."year_quarter", 
> > > "dim_date"."quarter_number", "dim_date"."year_month_int", 
> > > "dim_date"."year_month" order by "dim_date"."year4" ASC NULLS LAST, 
> > > "dim_date"."year_quarter_int" ASC NULLS LAST, 
> > > "dim_date"."year_month_int" ASC NULLS LAST]
> > > 2014-10-03 15:13:46,153 DEBUG [mondrian.sql] 326: , exec 57 ms
> > > 2014-10-03 15:13:46,162 DEBUG [mondrian.sql] 326: , exec+fetch 66 ms,
> > > 241 rows
> > > 2014-10-03 15:13:46,206 DEBUG [mondrian.sql] 327: 
> > > SqlStatisticsProvider.getColumnCardinality: executing sql [select 
> > > count(distinct "year4") from "common"."dim_date"]
> > > 2014-10-03 15:13:46,251 DEBUG [mondrian.sql] 327: , exec 45 ms
> > > 2014-10-03 15:13:46,251 DEBUG [mondrian.sql] 327: , exec+fetch 45 ms,
> > > 1 rows
> > > 2014-10-03 15:13:46,296 DEBUG [mondrian.sql] 328: 
> > > SqlStatisticsProvider.getColumnCardinality: executing sql [select 
> > > count(distinct "year_quarter_int") from "common"."dim_date"]
> > > 2014-10-03 15:13:46,345 DEBUG [mondrian.sql] 328: , exec 48 ms
> > > 2014-10-03 15:13:46,345 DEBUG [mondrian.sql] 328: , exec+fetch 49 ms,
> > > 1 rows
> > > 2014-10-03 15:13:46,393 DEBUG [mondrian.sql] 329: 
> > > SqlStatisticsProvider.getColumnCardinality: executing sql [select 
> > > count(distinct "year_month_int") from "common"."dim_date"]
> > > 2014-10-03 15:13:46,445 DEBUG [mondrian.sql] 329: , exec 52 ms
> > > 2014-10-03 15:13:46,445 DEBUG [mondrian.sql] 329: , exec+fetch 52 ms,
> > > 1 rows
> > > 2014-10-03 15:13:46,490 DEBUG [mondrian.sql] 330: Segment.load: 
> > > executing sql [select "agg_monthly_fact_cdr"."startdate_year" as 
> > > "c0", "agg_monthly_fact_cdr"."startdate_quarter_number" as "c1", 
> > > "agg_monthly_fact_cdr"."startdate_month_number" as "c2",
> > > sum("agg_monthly_fact_cdr"."calls") as "m0" from 
> > > "dma_cdr"."agg_monthly_fact_cdr" as "agg_monthly_fact_cdr" where 
> > > "agg_monthly_fact_cdr"."startdate_year" = 2005 and 
> > > "agg_monthly_fact_cdr"."startdate_quarter_number" = 200501 and 
> > > "agg_monthly_fact_cdr"."startdate_month_number" = 200501 group by 
> > > "agg_monthly_fact_cdr"."startdate_year",
> > > "agg_monthly_fact_cdr"."startdate_quarter_number",
> > > "agg_monthly_fact_cdr"."startdate_month_number"]
> > > 
> > > many more ... last one loops through all year-quarter-month 
> > > combinations from the date table
> > > 
> > > The strategy seems to be:
> > > 1) Get year - quarter - month combinations from dim date table [Not 
> > > too sure why this is executed twice]
> > > 2) Get distinct year, quarter, month values from dim date table
> > > 3) Get measures from agg table for every year-quarter-month 
> > > combination from the dim date table (so no matter if there are record 
> > > for this period in the agg table or not)
> > > 
> > > Results are showing everything (so EMPTY and NON EMPTY).
> > > 
> > > 
> > > CASE 2
> > > -----------
> > > 
> > > 
> > > Then I manually changed the Saiku MDX query to the following:
> > > 
> > > SELECT
> > > NON EMPTY {[Measures].[CDR Count]} ON COLUMNS, NON EMPTY  {[Start 
> > > Date.Monthly Calendar].[Month].Members} ON ROWS FROM [CDR]
> > > 
> > > The SQL Mondrian creates is more to what I would expect:
> > > 
> > > 2014-10-03 14:58:27,352 DEBUG [mondrian.mdx] 3002: select NON EMPTY 
> > > {[Measures].[CDR Count]} ON COLUMNS,  NON EMPTY {[Start Date.Monthly 
> > > Calendar].[Month].Members} ON ROWS from [CDR]
> > > 
> > > 2014-10-03 14:58:27,403 DEBUG [mondrian.sql] 286: 
> > > HighCardSqlTupleReader.readTuples [[Start Date.Monthly
> > > Calendar].[Month]]: executing sql [select 
> > > "agg_monthly_fact_cdr"."startdate_year" as "c0", 
> > > "dim_date"."year_quarter_int" as "c1", "dim_date"."year_quarter" as 
> > > "c2", "dim_date"."quarter_number" as "c3", "dim_date"."year_month_int"
> > > as "c4", "dim_date"."year_month" as "c5" from 
> > > "dma_cdr"."agg_monthly_fact_cdr" as "agg_monthly_fact_cdr", 
> > > "common"."dim_date" as "dim_date" where "dim_date"."year_quarter_int"
> > > = "agg_monthly_fact_cdr"."startdate_quarter_number" and 
> > > "dim_date"."year_month_int" = 
> > > "agg_monthly_fact_cdr"."startdate_month_number" group by 
> > > "agg_monthly_fact_cdr"."startdate_year",
> > > "dim_date"."year_quarter_int", "dim_date"."year_quarter", 
> > > "dim_date"."quarter_number", "dim_date"."year_month_int", 
> > > "dim_date"."year_month" order by 
> > > "agg_monthly_fact_cdr"."startdate_year" ASC NULLS LAST, 
> > > "dim_date"."year_quarter_int" ASC NULLS LAST, 
> > > "dim_date"."year_month_int" ASC NULLS LAST]
> > > 2014-10-03 14:58:43,759 DEBUG [mondrian.sql] 286: , exec 16355 ms
> > > 2014-10-03 14:58:43,759 DEBUG [mondrian.sql] 286: , exec+fetch 16356 
> > > ms, 10 rows
> > > 2014-10-03 14:58:43,802 DEBUG [mondrian.sql] 287: 
> > > HighCardSqlTupleReader.readTuples [[Start Date.Monthly
> > > Calendar].[Month]]: executing sql [select 
> > > "agg_monthly_fact_cdr"."startdate_year" as "c0", 
> > > "dim_date"."year_quarter_int" as "c1", "dim_date"."year_quarter" as 
> > > "c2", "dim_date"."quarter_number" as "c3", "dim_date"."year_month_int"
> > > as "c4", "dim_date"."year_month" as "c5" from 
> > > "dma_cdr"."agg_monthly_fact_cdr" as "agg_monthly_fact_cdr", 
> > > "common"."dim_date" as "dim_date" where "dim_date"."year_quarter_int"
> > > = "agg_monthly_fact_cdr"."startdate_quarter_number" and 
> > > "dim_date"."year_month_int" = 
> > > "agg_monthly_fact_cdr"."startdate_month_number" group by 
> > > "agg_monthly_fact_cdr"."startdate_year",
> > > "dim_date"."year_quarter_int", "dim_date"."year_quarter", 
> > > "dim_date"."quarter_number", "dim_date"."year_month_int", 
> > > "dim_date"."year_month" order by 
> > > "agg_monthly_fact_cdr"."startdate_year" ASC NULLS LAST, 
> > > "dim_date"."year_quarter_int" ASC NULLS LAST, 
> > > "dim_date"."year_month_int" ASC NULLS LAST]
> > > 2014-10-03 14:58:59,080 DEBUG [mondrian.sql] 287: , exec 15276 ms
> > > 2014-10-03 14:58:59,080 DEBUG [mondrian.sql] 287: , exec+fetch 15277 
> > > ms, 10 rows
> > > 2014-10-03 14:58:59,125 DEBUG [mondrian.sql] 288: 
> > > SqlStatisticsProvider.getColumnCardinality: executing sql [select 
> > > count(distinct "year4") from "common"."dim_date"]
> > > 2014-10-03 14:58:59,174 DEBUG [mondrian.sql] 288: , exec 48 ms
> > > 2014-10-03 14:58:59,174 DEBUG [mondrian.sql] 288: , exec+fetch 49 ms,
> > > 1 rows
> > > 2014-10-03 14:58:59,219 DEBUG [mondrian.sql] 289: 
> > > SqlStatisticsProvider.getColumnCardinality: executing sql [select 
> > > count(distinct "year_quarter_int") from "common"."dim_date"]
> > > 2014-10-03 14:58:59,269 DEBUG [mondrian.sql] 289: , exec 50 ms
> > > 2014-10-03 14:58:59,269 DEBUG [mondrian.sql] 289: , exec+fetch 50 ms,
> > > 1 rows
> > > 2014-10-03 14:58:59,315 DEBUG [mondrian.sql] 290: 
> > > SqlStatisticsProvider.getColumnCardinality: executing sql [select 
> > > count(distinct "year_month_int") from "common"."dim_date"]
> > > 2014-10-03 14:58:59,362 DEBUG [mondrian.sql] 290: , exec 47 ms
> > > 2014-10-03 14:58:59,363 DEBUG [mondrian.sql] 290: , exec+fetch 48 ms,
> > > 1 rows
> > > 2014-10-03 14:58:59,408 DEBUG [mondrian.sql] 291: Segment.load: 
> > > executing sql [select "agg_monthly_fact_cdr"."startdate_year" as 
> > > "c0", "agg_monthly_fact_cdr"."startdate_quarter_number" as "c1", 
> > > "agg_monthly_fact_cdr"."startdate_month_number" as "c2",
> > > sum("agg_monthly_fact_cdr"."calls") as "m0" from 
> > > "dma_cdr"."agg_monthly_fact_cdr" as "agg_monthly_fact_cdr" where 
> > > "agg_monthly_fact_cdr"."startdate_year" = 2013 and 
> > > "agg_monthly_fact_cdr"."startdate_quarter_number" = 201304 and 
> > > "agg_monthly_fact_cdr"."startdate_month_number" = 201311 group by 
> > > "agg_monthly_fact_cdr"."startdate_year",
> > > "agg_monthly_fact_cdr"."startdate_quarter_number",
> > > "agg_monthly_fact_cdr"."startdate_month_number"]
> > > 2014-10-03 14:58:59,458 DEBUG [mondrian.sql] 291: , exec 49 ms
> > > 2014-10-03 14:58:59,459 DEBUG [mondrian.sql] 291: , exec+fetch 50 ms,
> > > 1 rows
> > > 2014-10-03 14:58:59,459 DEBUG [mondrian.mdx] 3002: exec: 32107 ms
> > > 2014-10-03 14:58:59,505 DEBUG [mondrian.sql] 292: Segment.load: 
> > > executing sql [select "agg_monthly_fact_cdr"."startdate_year" as 
> > > "c0", "agg_monthly_fact_cdr"."startdate_quarter_number" as "c1", 
> > > "agg_monthly_fact_cdr"."startdate_month_number" as "c2",
> > > sum("agg_monthly_fact_cdr"."calls") as "m0" from 
> > > "dma_cdr"."agg_monthly_fact_cdr" as "agg_monthly_fact_cdr" where 
> > > "agg_monthly_fact_cdr"."startdate_year" = 2014 and 
> > > "agg_monthly_fact_cdr"."startdate_quarter_number" = 201401 and 
> > > "agg_monthly_fact_cdr"."startdate_month_number" = 201402 group by 
> > > "agg_monthly_fact_cdr"."startdate_year",
> > > "agg_monthly_fact_cdr"."startdate_quarter_number",
> > > "agg_monthly_fact_cdr"."startdate_month_number"]
> > > 
> > > So here the strategy seems to be:
> > > 1) Get the year-quarter-month combinations from AGG table (which do 
> > > have a measure) [not quite sure though why this is done twice]
> > > 2) Get the unique year, quarter and month values from the dim date 
> > > table
> > > 3) Loop over the agg table for each year-quarter-month combination 
> > > and get the measure
> > > 
> > > And the result shows all NON EMPTY members - which is what I want.
> > > 
> > > So, why is this happening? Both queries should in theory return the same \
> > > results, no? 
> > > Cheers,
> > > Diddy
> > > _______________________________________________
> > > Mondrian mailing list
> > > Mondrian@pentaho.org
> > > http://lists.pentaho.org/mailman/listinfo/mondrian
> > > _______________________________________________
> > > Mondrian mailing list
> > > Mondrian@pentaho.org
> > > http://lists.pentaho.org/mailman/listinfo/mondrian
> > 
> > _______________________________________________
> > Mondrian mailing list
> > Mondrian@pentaho.org
> > http://lists.pentaho.org/mailman/listinfo/mondrian
> > _______________________________________________
> > Mondrian mailing list
> > Mondrian@pentaho.org
> > http://lists.pentaho.org/mailman/listinfo/mondrian
> 
> _______________________________________________
> Mondrian mailing list
> Mondrian@pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian
> _______________________________________________
> Mondrian mailing list
> Mondrian@pentaho.org
> http://lists.pentaho.org/mailman/listinfo/mondrian

_______________________________________________
Mondrian mailing list
Mondrian@pentaho.org
http://lists.pentaho.org/mailman/listinfo/mondrian


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic