Bug in PowerPivot 2012 for Excel 2010


I spotted what I firmly believe to be a bug in PowerPivot. If you already met this problem and/or can reproduce this issue, please support this ticket on MS Connect.

An MS employee might – someday , maybe – respond to my bug report and say this is “by design”. This employee may also advance some pretty good reasons why this is so. However, until then, I cannot conceive of any good reason for this behavior.

Introduction

The data

For the rest of this post, I will use an embedded PowerPivot model based on a linked table. The table is called ‘SomeData’:

Country Product OtherAttribute
USA A Y
FRA B Y
USA C N
FRA C N
UK A N
GER B N
USA A N
FRA B N
FRA D N

What is this bug?

Instead of an elaborate discussion about this bug, let me just start introducing it with a mini-quiz.

Questions

Question 1:

Our model only contains the table ‘SomeData’ presented above. We run the following query against our model.

EVALUATE
CALCULATETABLE(     VALUES(‘SomeData’) ,
SUMMARIZE(
FILTER( ‘SomeData’, [OtherAttribute] = “Y”)
, [Country], [Product]
)
)

What does the query return?

  1. All rows for which Country = “USA” and Product = “A” or Country = “FRA” and Product = “B”.
  2. All rows for which Country = “USA” or Country = “FRA”
  3. All rows for which Product = “A” or Product = “B”.
  4. It depends

Question 2:

Our model only contains the table ‘SomeData’ presented above. We run the following query against our model.

EVALUATE
CALCULATETABLE( ‘SomeData’,
SUMMARIZE(
FILTER( ‘SomeData’, [OtherAttribute] = “Y”)
, [Country], [Product]
)
)

What does the query return?

  1. All rows for which Country = “USA” and Product = “A” or Country = “FRA” and Product = “B”.
  2. All rows for which Country = “USA” or Country = “FRA”
  3. All rows for which Product = “A” or Product = “B”.
  4. It depends

Answers:

Question 1:

1. All rows for which Country = “USA” and Product = “A” or Country = “FRA” and Product = “B”.

Question 2:

4. It depends: The result can be either 2 or 3.

Stating the problem

The only difference between both queries, is the presence (or absence) of the VALUES function in the first argument. This means the expression passed as a first argument to the CALCULATE statement changes the way the 2nd argument is evaluated. This is only supposed to work the other way around.

In the second case, the evaluation of the context is incorrect. PowerPivot will only filter on one column. Which one depends on the order of the columns in the ‘SomeData’ Excel table.

As a result, a query might  have different results depending on how the workbook was built.

A working example (direct filtering) …

Returning an arbitrary set of tuples

Consider the following query:

EVALUATE
SUMMARIZE(
FILTER( ‘SomeData’
, [Country] = “USA” && [Product] = “A”
|| [Country] = “FRA” && [Product] = “B”
)
, [Country], [Product]
)

This query returns the set of (Country, Product) tuples according to my predicates:

SomeData[Country] SomeData[Product]
USA A
FRA B

Filtering data based on a calculated set

If I want to get all rows from ‘SomeData’, where the tuple (Product,  Country) matches one of the tuples in the previous set, I can use the previous table expression as the 2nd argument of a CALCULATETABLE expression. This gives the following query:

EVALUATE
CALCULATETABLE( ‘SomeData’,
SUMMARIZE(
FILTER( ‘SomeData’
, [Country] = “USA” && [Product] = “A”
|| [Country] = “FRA” && [Product] = “B”
)
, [Country], [Product]
)
)

The results:

SomeData[Country] SomeData[Product] SomeData[OtherAttribute]
USA A Y
FRA B Y
USA A N
FRA B N

So far so good.  Everything works as expected.

… that no longer works (cross-filtering)

Returning an arbitrary set of tuples

Instead of explicitly filtering the (Country, Product) tuples, as above, let us now filter our data on the [OtherAttribute] column.

EVALUATE
SUMMARIZE(
FILTER( ‘SomeData’, [OtherAttribute] = “Y”)
, [Country], [Product]
)

As before, this returns:

SomeData[Country] SomeData[Product]
USA A
FRA B

Filtering data based on a calculated set

Let us plug, our new SUMMARIZE expression into our CALCULATETABLE expression.

EVALUATE
CALCULATETABLE( ‘SomeData’,
SUMMARIZE(
FILTER( ‘SomeData’, [OtherAttribute] = “Y”)
, [Country], [Product]
)
)

Unexpectedly, this time, the query returns the following results:

SomeData[Country] SomeData[Product] SomeData[OtherAttribute]
USA A Y
FRA B Y
UK A N
GER B N
USA A N
FRA B N

Note that the engine now returns all rows from ‘SomeData’ for which [Product] is in { “A”, “B” }. This is confirmed by SQL Server Profiler.

Replacing ‘SomeData’ with VALUES(‘SomeData’), however, will return the expected results.

Reordering the columns

Worse, let us just change the order of the columns in our Excel table.

Just drag and drop the [Country] column to the left of the [Product] column, and refresh the PowerPivot model. We have not made any changes to the PowerPivot model itself.

Let us run our previous query again.

The results now become:

Country Product OtherAttribute
USA A Y
FRA B Y
USA C N
FRA C N
USA A N
FRA B N
FRA D N

Which means, the query now only filters by Country. Once again, this is confirmed by SQL Server Profiler.

When a filter context is available

Finally, let us add another CALCULATETABLE expression, so that our query becomes:

EVALUATE
CALCULATETABLE(
CALCULATETABLE( ‘SomeData’,
SUMMARIZE(
FILTER( (‘SomeData’), [OtherAttribute] = “Y”)
, [Country], [Product]
)
)
, ‘SomeData’[Country] = “FRA”)

‘SomeData’ is first filtered by [Country], which only returns the rows for France. These data should then be filtered according to the value in [OtherAttribute], then be summarized by [Country] and [Product]. The one tuple remaining (“FRA”, “B”) should then be used to  filter ‘SomeData’ – overriding our initial filter. This is not the case. Here, the product will only get filtered by [Country].

Once again, replacing ‘SomeData’ in the first argument of the innermost CALCULATETABLE expression with VALUES(‘SomeData’) will produce the expected results.

In that case, however, the result of the query does not seem to depend on the order of the columns in the Excel table.

Measures display the same behavior

Example with COUNTROWS

Do measures display the same behavior? Unfortunately, yes.

Let us consider this measure:

[Fine] :=

    CALCULATE( COUNTROWS( VALUES(‘SomeData’) ),
SUMMARIZE(
FILTER( ‘SomeData’, [OtherAttribute] = “Y”)
, [Country],[Product]
)
)

This measure works as expected:

Fine
A B Total
FRA 2 2
USA 2 2
Total 2 2 4

For the same reasons as in the previous sections, this measure does not:

[Not fine] :=

    CALCULATE( COUNTROWS( ‘SomeData’ ),
SUMMARIZE(
FILTER( ‘SomeData’, [OtherAttribute] = “Y”)
, [Country],[Product]
)
)

The result:

Not fine
A B Total
FRA 2 4
USA 2 3
Total 3 3 6

Once again the grand total may change depending on the order of the columns in the linked table.

Example with SUM

You might think: “Not an issue. I always use VALUES when referencing tables or columns”.

Do  you?

Standard aggregate functions like COUNT, COUNTA, SUM, … only accept a direct reference to a column.

To check this, let us add a new column to our table:

Country Product OtherAttribute CheckColumn
USA A Y 100000000
FRA B Y 20000000
USA C N 3000000
FRA C N 400000
UK A N 50000
GER B N 6000
USA A N 700
FRA B N 80
FRA D N 9

Now, let us observer the same unexpected behavior.

[Sum fails] : =

CALCULATE(   SUM(‘SomeData’[CheckColumn]   ),
SUMMARIZE(
FILTER( ‘SomeData’, [OtherAttribute] = “Y”)
, [Country],[Product]
)
)

The results:

Sum fails
A B Total
FRA 20000080 20400089
USA 100000700 103000700
Total 100050700 20006080 120056780

Note that it should be easy to track which row is in the context for each cell:

For example, the total for FRA is 20400089.  This clearly indicates the last two rows in our previous table (where [CheckColumn] = 80 and 9 ) were included in our table.

Here is one way to make it work as expected.

[Sum works]: =

CALCULATE(
CALCULATE( SUM(‘SomeData’[CheckColumn]   ), VALUES(‘SomeData’)) ,
SUMMARIZE(
FILTER( ‘SomeData’, [OtherAttribute] = “Y”)
, [Country],[Product]
)
)

Sum works
A B Total
FRA 20000080 20000080
USA 100000700 100000700
Total 100000700 20000080 120000780
About these ads

2 Responses to “Bug in PowerPivot 2012 for Excel 2010”

  1. David Hager Says:

    If you only have 1 table in your PowerPivot model, why would you need to use the SUMMARIZE function?

    • TheDataSpecialist Says:

      If the goal was just to summarize the data, a simple pivot table would be enough. In that case, however, the goal is to identify the combinations of Country and Product for which some condition is true, and then use these combinations to calculate our final measures.

      SUMMARIZE is there to make certain we only pass the relevant tuples ( Product, Country ), and not the other columns, to the CALCULATE / CALCULATETABLE expressions.

      I acknowledge the article falls short of explaining why this technique could be useful, but the goal was to demonstrate the existence of a bug.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 25 other followers

%d bloggers like this: