I spotted what I firmly believe to be a bug in PowerPivot. If you already met this problem and/or can reproduce this issue, please support this ticket on MS Connect.
An MS employee might – someday , maybe – respond to my bug report and say this is “by design”. This employee may also advance some pretty good reasons why this is so. However, until then, I cannot conceive of any good reason for this behavior.
Introduction
The data
For the rest of this post, I will use an embedded PowerPivot model based on a linked table. The table is called ‘SomeData’:
Country | Product | OtherAttribute |
USA | A | Y |
FRA | B | Y |
USA | C | N |
FRA | C | N |
UK | A | N |
GER | B | N |
USA | A | N |
FRA | B | N |
FRA | D | N |
What is this bug?
Instead of an elaborate discussion about this bug, let me just start introducing it with a mini-quiz.
Questions
Question 1:
Our model only contains the table ‘SomeData’ presented above. We run the following query against our model.
EVALUATE
CALCULATETABLE( VALUES(‘SomeData’) ,
SUMMARIZE(
FILTER( ‘SomeData’, [OtherAttribute] = “Y”)
, [Country], [Product]
)
)
What does the query return?
- All rows for which Country = “USA” and Product = “A” or Country = “FRA” and Product = “B”.
- All rows for which Country = “USA” or Country = “FRA”
- All rows for which Product = “A” or Product = “B”.
- It depends
Question 2:
Our model only contains the table ‘SomeData’ presented above. We run the following query against our model.
EVALUATE
CALCULATETABLE( ‘SomeData’,
SUMMARIZE(
FILTER( ‘SomeData’, [OtherAttribute] = “Y”)
, [Country], [Product]
)
)
What does the query return?
- All rows for which Country = “USA” and Product = “A” or Country = “FRA” and Product = “B”.
- All rows for which Country = “USA” or Country = “FRA”
- All rows for which Product = “A” or Product = “B”.
- It depends
Answers:
Question 1:
1. All rows for which Country = “USA” and Product = “A” or Country = “FRA” and Product = “B”.
Question 2:
4. It depends: The result can be either 2 or 3.
Stating the problem
The only difference between both queries, is the presence (or absence) of the VALUES function in the first argument. This means the expression passed as a first argument to the CALCULATE statement changes the way the 2nd argument is evaluated. This is only supposed to work the other way around.
In the second case, the evaluation of the context is incorrect. PowerPivot will only filter on one column. Which one depends on the order of the columns in the ‘SomeData’ Excel table.
As a result, a query might have different results depending on how the workbook was built.
A working example (direct filtering) …
Returning an arbitrary set of tuples
Consider the following query:
EVALUATE
SUMMARIZE(
FILTER( ‘SomeData’
, [Country] = “USA” && [Product] = “A”
|| [Country] = “FRA” && [Product] = “B”
)
, [Country], [Product]
)
This query returns the set of (Country, Product) tuples according to my predicates:
SomeData[Country] | SomeData[Product] |
USA | A |
FRA | B |
Filtering data based on a calculated set
If I want to get all rows from ‘SomeData’, where the tuple (Product, Country) matches one of the tuples in the previous set, I can use the previous table expression as the 2nd argument of a CALCULATETABLE expression. This gives the following query:
EVALUATE
CALCULATETABLE( ‘SomeData’,
SUMMARIZE(
FILTER( ‘SomeData’
, [Country] = “USA” && [Product] = “A”
|| [Country] = “FRA” && [Product] = “B”
)
, [Country], [Product]
)
)
The results:
SomeData[Country] | SomeData[Product] | SomeData[OtherAttribute] |
USA | A | Y |
FRA | B | Y |
USA | A | N |
FRA | B | N |
So far so good. Everything works as expected.
… that no longer works (cross-filtering)
Returning an arbitrary set of tuples
Instead of explicitly filtering the (Country, Product) tuples, as above, let us now filter our data on the [OtherAttribute] column.
EVALUATE
SUMMARIZE(
FILTER( ‘SomeData’, [OtherAttribute] = “Y”)
, [Country], [Product]
)
As before, this returns:
SomeData[Country] | SomeData[Product] |
USA | A |
FRA | B |
Filtering data based on a calculated set
Let us plug, our new SUMMARIZE expression into our CALCULATETABLE expression.
EVALUATE
CALCULATETABLE( ‘SomeData’,
SUMMARIZE(
FILTER( ‘SomeData’, [OtherAttribute] = “Y”)
, [Country], [Product]
)
)
Unexpectedly, this time, the query returns the following results:
SomeData[Country] | SomeData[Product] | SomeData[OtherAttribute] |
USA | A | Y |
FRA | B | Y |
UK | A | N |
GER | B | N |
USA | A | N |
FRA | B | N |
Note that the engine now returns all rows from ‘SomeData’ for which [Product] is in { “A”, “B” }. This is confirmed by SQL Server Profiler.
Replacing ‘SomeData’ with VALUES(‘SomeData’), however, will return the expected results.
Reordering the columns
Worse, let us just change the order of the columns in our Excel table.
Just drag and drop the [Country] column to the left of the [Product] column, and refresh the PowerPivot model. We have not made any changes to the PowerPivot model itself.
Let us run our previous query again.
The results now become:
Country | Product | OtherAttribute |
USA | A | Y |
FRA | B | Y |
USA | C | N |
FRA | C | N |
USA | A | N |
FRA | B | N |
FRA | D | N |
Which means, the query now only filters by Country. Once again, this is confirmed by SQL Server Profiler.
When a filter context is available
Finally, let us add another CALCULATETABLE expression, so that our query becomes:
EVALUATE
CALCULATETABLE(
CALCULATETABLE( ‘SomeData’,
SUMMARIZE(
FILTER( (‘SomeData’), [OtherAttribute] = “Y”)
, [Country], [Product]
)
)
, ‘SomeData'[Country] = “FRA”)
‘SomeData’ is first filtered by [Country], which only returns the rows for France. These data should then be filtered according to the value in [OtherAttribute], then be summarized by [Country] and [Product]. The one tuple remaining (“FRA”, “B”) should then be used to filter ‘SomeData’ – overriding our initial filter. This is not the case. Here, the product will only get filtered by [Country].
Once again, replacing ‘SomeData’ in the first argument of the innermost CALCULATETABLE expression with VALUES(‘SomeData’) will produce the expected results.
In that case, however, the result of the query does not seem to depend on the order of the columns in the Excel table.
Measures display the same behavior
Example with COUNTROWS
Do measures display the same behavior? Unfortunately, yes.
Let us consider this measure:
[Fine] :=
CALCULATE( COUNTROWS( VALUES(‘SomeData’) ),
SUMMARIZE(
FILTER( ‘SomeData’, [OtherAttribute] = “Y”)
, [Country],[Product]
)
)
This measure works as expected:
Fine | |||
A | B | Total | |
FRA | 2 | 2 | |
USA | 2 | 2 | |
Total | 2 | 2 | 4 |
For the same reasons as in the previous sections, this measure does not:
[Not fine] :=
CALCULATE( COUNTROWS( ‘SomeData’ ),
SUMMARIZE(
FILTER( ‘SomeData’, [OtherAttribute] = “Y”)
, [Country],[Product]
)
)
The result:
Not fine | |||
A | B | Total | |
FRA | 2 | 4 | |
USA | 2 | 3 | |
Total | 3 | 3 | 6 |
Once again the grand total may change depending on the order of the columns in the linked table.
Example with SUM
You might think: “Not an issue. I always use VALUES when referencing tables or columns”.
Do you?
Standard aggregate functions like COUNT, COUNTA, SUM, … only accept a direct reference to a column.
To check this, let us add a new column to our table:
Country | Product | OtherAttribute | CheckColumn |
USA | A | Y | 100000000 |
FRA | B | Y | 20000000 |
USA | C | N | 3000000 |
FRA | C | N | 400000 |
UK | A | N | 50000 |
GER | B | N | 6000 |
USA | A | N | 700 |
FRA | B | N | 80 |
FRA | D | N | 9 |
Now, let us observer the same unexpected behavior.
[Sum fails] : =
CALCULATE( SUM(‘SomeData'[CheckColumn] ),
SUMMARIZE(
FILTER( ‘SomeData’, [OtherAttribute] = “Y”)
, [Country],[Product]
)
)
The results:
Sum fails | |||
A | B | Total | |
FRA | 20000080 | 20400089 | |
USA | 100000700 | 103000700 | |
Total | 100050700 | 20006080 | 120056780 |
Note that it should be easy to track which row is in the context for each cell:
For example, the total for FRA is 20400089. This clearly indicates the last two rows in our previous table (where [CheckColumn] = 80 and 9 ) were included in our table.
Here is one way to make it work as expected.
[Sum works]: =
CALCULATE(
CALCULATE( SUM(‘SomeData'[CheckColumn] ), VALUES(‘SomeData’)) ,
SUMMARIZE(
FILTER( ‘SomeData’, [OtherAttribute] = “Y”)
, [Country],[Product]
)
)
Sum works | |||
A | B | Total | |
FRA | 20000080 | 20000080 | |
USA | 100000700 | 100000700 | |
Total | 100000700 | 20000080 | 120000780 |
February 14, 2013 at 2:22 am
If you only have 1 table in your PowerPivot model, why would you need to use the SUMMARIZE function?
February 14, 2013 at 11:25 am
If the goal was just to summarize the data, a simple pivot table would be enough. In that case, however, the goal is to identify the combinations of Country and Product for which some condition is true, and then use these combinations to calculate our final measures.
SUMMARIZE is there to make certain we only pass the relevant tuples ( Product, Country ), and not the other columns, to the CALCULATE / CALCULATETABLE expressions.
I acknowledge the article falls short of explaining why this technique could be useful, but the goal was to demonstrate the existence of a bug.