Inconsistent drillthrough behavior in Power Pivot and SSAS Tabular

When you define more than one relationship between two tables, drillthroughs may return unexpected results.

Get the file!

The problem

If two or more relationships exist between two tables, knowing which relationship is active is not enough to determine the drillthrough behavior of your model.

As a consequence, if you change the active relationship in your model, you may observe unexpected drillthrough behaviors.

Consider the following diagrams, based on the same model.

This is a simple model with two tables. There are two relationships in this model, both linking the table Data to the table Dim: one using the column Rel1, the other one using the column Rel2.

We will call these relationships Rel1 and Rel2 respectively. As you can see from the diagrams, Rel1 is active in both cases.

diagram1

diagram2

In the first case, however, drillthrough will be based on Rel1. In the second case, it will be based on Rel2.

It appears drillthrough occurs according to the bottommost relationship in the Power Pivot diagram view (this is the other way around in Visual Studio, if I recall correctly).

Setup

Here are the data in both tables.

Dim
Click here –>
Rel1 Rel2 Comment
Click here –> Drillthrough based on Rel1
Click here –> Drillthrough based on Rel2

We can create a simple pivot table like the following.

pivot_table

Just after we created the relationships, double-clicking on the cell will create a new sheet returning the following results.

[$Data].[Rel2] [$Data].[Comment]
Drillthrough is based on Rel1

The Comment column makes it easy to spot which relationship was used for the drillthrough: Rel1.

This works as expected.

Changing the active relationship in the diagram view

Let us activate Rel2 in the diagram view, using the following steps:

  1. De-activate Rel1
  2. Double-click on Rel2, toggle the Active checkbox
  3. Press OK

Take a look at the result. The active relationship is now the topmost relationship in the diagram view.

Check the drillthrough sheets. Despite the update, the results stay the same.

Let us activate Rel1 again (using the same steps as before), and check the result. The drillthrough was done according to Rel2!

$Data].[Rel2] [$Data].[Comment]
Drillthrough is based on Rel2

Let us activate Rel2 again: drillthrough was done according to Rel1.

$Data].[Rel2] [$Data].[Comment]
Drillthrough is based on Rel1

You can ad lib this.

Note that if you activate a relationship using the following steps:

  1. Right-click the relationship
  2. Mark as active

Its position in the diagram view will not change.

As a corollary, it will also have no impact on the drillthrough behavior.

Workaround

The easiest way to change the active relationship in a model while maintaining a consistent drillthrough behavior is to

  1. delete the relationships
  2. recreate them, starting with the (new) active one.

Microsoft Data Explorer–a review

As you are probably aware, Microsoft released a Preview version of the Data Explorer add-In. It is currently available for Excel 2010 and Excel 2013. Hopefully, this will still be true for the release version (and not only for VL customers).

Mashing up data from different sources has never been the great strength of Excel. Sure, it was possible, but implied a mix of queries formulas, and macros that often led to crappy convoluted workbooks. Data Explorer allows users to create data mash-ups from within Excel, using a single interface and a consistent expression language.

The ribbon

The interface is good, superior to the one provided in PowerPivot, or DQS (other MS tools that allow business users to actively contribute to the information system.)

The seamless integration with Excel is a huge advantage over similar tools (say, Google Refine).

Since the alpha version, things have changed substantially. The Excel add-in now comes with a ribbon, which is clean and efficient, as well as custom task panes in the Modern UI (aka Windows 8) style.

image

You first connect to a source, then shape your query right away. Many sources are available, including tables in the current workbook. You can also reference Data Explorer queries in other queries. This means queries can be re-used. You do not have to rewrite the same query every time. The Merge and Append options allow you to do joins and unions.

Data Explorer accepts hierarchical data sources, not just flat ones. In simple terms, a cell in a column can contain a table. In the query editor, you can navigate to a nested table by double-clicking it. The interface provides a consistent experience across all types of sources.

The editor

The editor was revamped and streamlined. It is intuitive and works smoothly.

image

The ribbon is gone, which is not a bad thing. The formula bar allows you to type expressions, using the “M” language that Data Explorer uses (more on that later). At the time, there is no support for an Intellisense-like feature, and no function wizard as you may know it from Excel.

However, you can do many things from the interface, without ever writing an expression: you can filter, group and transform data in columns … by right-clicking the right object and choosing the relevant transformation. The generated expression can be seen (and edited) in the formula bar.

Each transformation you apply creates a new step in your query. You can rearrange your query by moving steps up or down (except the Source step which you cannot rename or move): did you want to summarize your data, and then filter instead of filtering then summarizing? No problem. Just move the relevant step up or down.

This has a few advantages:

  • The workflow is natural. You can edit one transformation at a time.
  • For every step, you get a visual feedback of the transformation you just applied.
  • This allows you to keep your queries tidy and avoid unreadable nested calculations

When you write your own expressions, you can choose between chaining multiple steps or nesting/combining expressions. The only thing you should actually be worried about is whether your decision will make the workflow easier to read.

You can rename queries from the designer, but also rename each step. As far as I know, this is the only way to document steps in a query.

Adding calculated columns to a query might require you to write expressions. In many use cases, this will be easy. However, some common calculations (dates, string manipulations, …) require a call to a specific function. For the moment, due to the state of the documentation, finding the function you need requires extra work.

I hope Microsoft will not forget to improve the documentation, once the product gets released. The basic documentation could afford some formatting. The language specification documents will probably be too dense for most users.

The expression language

Data Explorer uses the “M” language, which is expression-based.

This means this language is about writing expressions (formulas) just like you would in Excel or DAX. From what I could take from the documentation and quick experimentations, the language is actually extremely powerful.

A bunch of people at Microsoft Research, are very much into functional programing: if you liked LINQ and F#, then you will like the M language. (Microsoft once had a project called Oslo which featured a language  called “M”. Is it the same? I could not recognize it.)

The language supports different types of values: primitive values (string, numbers, …) but also lists, records, tables, and functions.

This means an expression can return a list, or records, or tables, or functions. A data explorer query can return a function, or a table of functions. You can apply fold operations on lists, define functions within a query, create functions that returns functions, or accept functions as parameters … The language supports closure, recursion …

  • Return a list of number:
    Source := { 1 , 2, 3 }
  • Return numbers from 1 to 10:
    Source  := { 1 … 10 }
  • Return a record with A, B, C where C is a calculated column :
    Source := [ A=1,   B=2, C = A + B ]
  • Return a function, and apply it to a value:
    Source := (x) => x + 1
    InvokedSource := Source(12)
  • Return a string that displays numbers from 1 to 10, separated by a comma
    Source := Text.RemoveRange(List.Accumulate({ 1 .. 10 } ,””, (state, acc) => state & “, ”  & Text.From(acc)  ) ,  0, 2)

You could create a query that takes a document as a source, parses the document and returns a function!

What you will be able to do with the language is absolutely huge. However, as I said before, most use cases will not require a full command of the language.

Conclusion

Data Explorer is not Excel: even if you can do  a lot without knowing much of the language, you may need to learn new functions.

However, due to an intuitive interface and a powerful expression language, the tool might appeal to a large audience from the slightly advanced Excel user, to the R addict.

I am not certain it will appeal to all types of developers, though:

  • I do not expect every C# developer to be a fan of functional programing
  • for the moment, Data Explorer is only available in Excel

On the other hand, this will not be a problem if Data Explorer queries (transformations) can be published as data services, or re-used within SSIS packages.

This leads to the following concern: The alpha version was very much about publishing your queries to the cloud. However, I could not find any reference to this scenario, whether in the ribbon or in the documentation. I hope this is an oversight on my part and this feature will be supported.

Data Explorer comes as a bunch of .Net dlls and an interface written in HTML/JavaScript. I really hope the guys at Microsoft will document the libraries, and provide an official entry point so that developers can add custom functionalities to the query designer, or build custom apps on top of the engine.

Finally, Data Explorer might even be too good to just be an Excel add-in. In the future, I could very well imagine never using an Excel formula again (at least if queries could be automatically updated upon data entry.) That summarizes how impressed I was with the product.

Bug in PowerPivot 2012 for Excel 2010

I spotted what I firmly believe to be a bug in PowerPivot. If you already met this problem and/or can reproduce this issue, please support this ticket on MS Connect.

Edit: This bug was fixed with SQL Server 2012 Service Pack 2.

An MS employee might – someday , maybe – respond to my bug report and say this is “by design”. This employee may also advance some pretty good reasons why this is so. However, until then, I cannot conceive of any good reason for this behavior.

Introduction

The data

For the rest of this post, I will use an embedded PowerPivot model based on a linked table. The table is called ‘SomeData’:

Country Product OtherAttribute
USA A Y
FRA B Y
USA C N
FRA C N
UK A N
GER B N
USA A N
FRA B N
FRA D N

What is this bug?

Instead of an elaborate discussion about this bug, let me just start introducing it with a mini-quiz.

Questions

Question 1:

Our model only contains the table ‘SomeData’ presented above. We run the following query against our model.

EVALUATE
CALCULATETABLE(     VALUES(‘SomeData’) ,
SUMMARIZE(
FILTER( ‘SomeData’, [OtherAttribute] = “Y”)
, [Country], [Product]
)
)

What does the query return?

  1. All rows for which Country = “USA” and Product = “A” or Country = “FRA” and Product = “B”.
  2. All rows for which Country = “USA” or Country = “FRA”
  3. All rows for which Product = “A” or Product = “B”.
  4. It depends

Question 2:

Our model only contains the table ‘SomeData’ presented above. We run the following query against our model.

EVALUATE
CALCULATETABLE( ‘SomeData’,
SUMMARIZE(
FILTER( ‘SomeData’, [OtherAttribute] = “Y”)
, [Country], [Product]
)
)

What does the query return?

  1. All rows for which Country = “USA” and Product = “A” or Country = “FRA” and Product = “B”.
  2. All rows for which Country = “USA” or Country = “FRA”
  3. All rows for which Product = “A” or Product = “B”.
  4. It depends

Answers:

Question 1:

1. All rows for which Country = “USA” and Product = “A” or Country = “FRA” and Product = “B”.

Question 2:

4. It depends: The result can be either 2 or 3.

Stating the problem

The only difference between both queries, is the presence (or absence) of the VALUES function in the first argument. This means the expression passed as a first argument to the CALCULATE statement changes the way the 2nd argument is evaluated. This is only supposed to work the other way around.

In the second case, the evaluation of the context is incorrect. PowerPivot will only filter on one column. Which one depends on the order of the columns in the ‘SomeData’ Excel table.

As a result, a query might  have different results depending on how the workbook was built.

A working example (direct filtering) …

Returning an arbitrary set of tuples

Consider the following query:

EVALUATE
SUMMARIZE(
FILTER( ‘SomeData’
, [Country] = “USA” && [Product] = “A”
|| [Country] = “FRA” && [Product] = “B”
)
, [Country], [Product]
)

This query returns the set of (Country, Product) tuples according to my predicates:

SomeData[Country] SomeData[Product]
USA A
FRA B

Filtering data based on a calculated set

If I want to get all rows from ‘SomeData’, where the tuple (Product,  Country) matches one of the tuples in the previous set, I can use the previous table expression as the 2nd argument of a CALCULATETABLE expression. This gives the following query:

EVALUATE
CALCULATETABLE( ‘SomeData’,
SUMMARIZE(
FILTER( ‘SomeData’
, [Country] = “USA” && [Product] = “A”
|| [Country] = “FRA” && [Product] = “B”
)
, [Country], [Product]
)
)

The results:

SomeData[Country] SomeData[Product] SomeData[OtherAttribute]
USA A Y
FRA B Y
USA A N
FRA B N

So far so good.  Everything works as expected.

… that no longer works (cross-filtering)

Returning an arbitrary set of tuples

Instead of explicitly filtering the (Country, Product) tuples, as above, let us now filter our data on the [OtherAttribute] column.

EVALUATE
SUMMARIZE(
FILTER( ‘SomeData’, [OtherAttribute] = “Y”)
, [Country], [Product]
)

As before, this returns:

SomeData[Country] SomeData[Product]
USA A
FRA B

Filtering data based on a calculated set

Let us plug, our new SUMMARIZE expression into our CALCULATETABLE expression.

EVALUATE
CALCULATETABLE( ‘SomeData’,
SUMMARIZE(
FILTER( ‘SomeData’, [OtherAttribute] = “Y”)
, [Country], [Product]
)
)

Unexpectedly, this time, the query returns the following results:

SomeData[Country] SomeData[Product] SomeData[OtherAttribute]
USA A Y
FRA B Y
UK A N
GER B N
USA A N
FRA B N

Note that the engine now returns all rows from ‘SomeData’ for which [Product] is in { “A”, “B” }. This is confirmed by SQL Server Profiler.

Replacing ‘SomeData’ with VALUES(‘SomeData’), however, will return the expected results.

Reordering the columns

Worse, let us just change the order of the columns in our Excel table.

Just drag and drop the [Country] column to the left of the [Product] column, and refresh the PowerPivot model. We have not made any changes to the PowerPivot model itself.

Let us run our previous query again.

The results now become:

Country Product OtherAttribute
USA A Y
FRA B Y
USA C N
FRA C N
USA A N
FRA B N
FRA D N

Which means, the query now only filters by Country. Once again, this is confirmed by SQL Server Profiler.

When a filter context is available

Finally, let us add another CALCULATETABLE expression, so that our query becomes:

EVALUATE
CALCULATETABLE(
CALCULATETABLE( ‘SomeData’,
SUMMARIZE(
FILTER( (‘SomeData’), [OtherAttribute] = “Y”)
, [Country], [Product]
)
)
, ‘SomeData'[Country] = “FRA”)

‘SomeData’ is first filtered by [Country], which only returns the rows for France. These data should then be filtered according to the value in [OtherAttribute], then be summarized by [Country] and [Product]. The one tuple remaining (“FRA”, “B”) should then be used to  filter ‘SomeData’ – overriding our initial filter. This is not the case. Here, the product will only get filtered by [Country].

Once again, replacing ‘SomeData’ in the first argument of the innermost CALCULATETABLE expression with VALUES(‘SomeData’) will produce the expected results.

In that case, however, the result of the query does not seem to depend on the order of the columns in the Excel table.

Measures display the same behavior

Example with COUNTROWS

Do measures display the same behavior? Unfortunately, yes.

Let us consider this measure:

[Fine] :=

    CALCULATE( COUNTROWS( VALUES(‘SomeData’) ),
SUMMARIZE(
FILTER( ‘SomeData’, [OtherAttribute] = “Y”)
, [Country],[Product]
)
)

This measure works as expected:

Fine
A B Total
FRA 2 2
USA 2 2
Total 2 2 4

For the same reasons as in the previous sections, this measure does not:

[Not fine] :=

    CALCULATE( COUNTROWS( ‘SomeData’ ),
SUMMARIZE(
FILTER( ‘SomeData’, [OtherAttribute] = “Y”)
, [Country],[Product]
)
)

The result:

Not fine
A B Total
FRA 2 4
USA 2 3
Total 3 3 6

Once again the grand total may change depending on the order of the columns in the linked table.

Example with SUM

You might think: “Not an issue. I always use VALUES when referencing tables or columns”.

Do  you?

Standard aggregate functions like COUNT, COUNTA, SUM, … only accept a direct reference to a column.

To check this, let us add a new column to our table:

Country Product OtherAttribute CheckColumn
USA A Y 100000000
FRA B Y 20000000
USA C N 3000000
FRA C N 400000
UK A N 50000
GER B N 6000
USA A N 700
FRA B N 80
FRA D N 9

Now, let us observer the same unexpected behavior.

[Sum fails] : =

CALCULATE(   SUM(‘SomeData'[CheckColumn]   ),
SUMMARIZE(
FILTER( ‘SomeData’, [OtherAttribute] = “Y”)
, [Country],[Product]
)
)

The results:

Sum fails
A B Total
FRA 20000080 20400089
USA 100000700 103000700
Total 100050700 20006080 120056780

Note that it should be easy to track which row is in the context for each cell:

For example, the total for FRA is 20400089.  This clearly indicates the last two rows in our previous table (where [CheckColumn] = 80 and 9 ) were included in our table.

Here is one way to make it work as expected.

[Sum works]: =

CALCULATE(
CALCULATE( SUM(‘SomeData'[CheckColumn]   ), VALUES(‘SomeData’)) ,
SUMMARIZE(
FILTER( ‘SomeData’, [OtherAttribute] = “Y”)
, [Country],[Product]
)
)

Sum works
A B Total
FRA 20000080 20000080
USA 100000700 100000700
Total 100000700 20000080 120000780

DAX Queries in Excel 2013 : using ROLLUP to cover some UNION scenarios

DAX does not offer a UNION function or operator. There are a few scenarios where this would be useful.

The scenario

Consider the following table called Games.

image

This is a very simple table. Basically, this table represents a relation between players, which can be stated as “Player 1 played against Player 2”. For now, I will ignore the meaning of why one player is listed in one column or the other.

In order to analyze this data, we will need a Player dimension, representing the set of all players, regardless of whether they were labeled as Player 1 or Player 2.

If the Games table came from a SQL connection, then we might as well use the following query to get the desired result.

SELECT [Player 1] AS Player FROM Games
UNION
SELECT [Player 2] FROM Games

The table would look like this.

image

(Remember that UNION only returns distinct rows).

A DAX solution

I already mentioned Excel 2013 supports DAX queries,  and the resulting tables can be re-used in the data model. Let us take a look at what we need to achieve the desired result.

First thing we need to care about, is that our result set might have more entries than our Games table. We will need to generate at least one row per distinct value in player 1 and 1 row per distinct value in player 2.

Now, imagine you create a PivotTable based on the Games table and put [Player 1] and [Player 2] on rows. This is what you will get:

image

For every distinct value of [Player 1], Excel will create a group, list all the values for [Player 2] within this group, and will add a row for the sub-total.

Since we cannot use the body range of a pivot table, as a data source, we will do this with a DAX query.

SUMMARIZE and ROLLUP will allow us to mimic this behavior.

The following DAX statement

EVALUATE
SUMMARIZE(
Games
, [Player 1]
, ROLLUP(  [Player 2])
, “Player”
, IF (ISSUBTOTAL(Games[Player 2]),[Player 1] , [Player 2])
)

will return the following table:

image

Note that ROLLUP([Player 2]) instructs PowerPivot to generate a subtotal row for the current value of [Player 1]. We also create a calculated column named Player, that will just take the value of [Player 2] for “regular” rows, and the value of [Player 1] for sub-totals. Also, ISSUBTOTAL allows us to differentiate subtotal rows from rows where [Player 2] is blank.

Finally, we just want to get the unique values in the Player column. For that, we will use, once again, the SUMMARIZE function.

The final query

EVALUATE
SUMMARIZE(
SUMMARIZE(
Games
, [Player 1]
, ROLLUP(  [Player 2])
, “Player”
, IF (ISSUBTOTAL(Games[Player 2]),[Player 1] , [Player 2])
)
,  [Player]
)

DAX Queries in Excel 2013

The “old” drill-through trick that could be used in Office 2010 and PowerPivot v2 no longer works in Excel 2013. You can no longer edit the connection for the resulting table.

Luckily, this is no longer required.

If you want to include a DAX query based on the model in your workbook, you can proceed as follows:

  • Go to Data > Existing Connections (EDIT: from the regular Excel ribbon, not the PowerPivot window.)
  • Select a table
  • When asked how you want to view your data, select “Table”
  • To edit the DAX query, right click on the table, then choose Table > Edit DAX …

If, like me, you get the error message “query could not run or the data model could not be accessed. Query (row, position) cannot reference a base table in the EVALUATE query, because it is undetermined”, check whether your query is correct.

The following query, for example, would run smoothly for the appropriate table.

EVALUATE
SUMMARIZE(Table,
ROLLUP( Table[Category],
Table[Subcategory]),
“Sales”, COUNTA( Table[Subcategory])

)

Note that in this mode, the EN-US locales seem to be in use.

Finally, you can reuse the table in your model, if you wish. I have not tested how this affects the updating of the model, though.

Slicers and Drill-through in Excel.

While I was playing around with Excel 2013, I noticed some inconsistency in the way slicers behave depending on whether they are used to filter a PowerPivot table, or whether they are used to filter a traditional pivot table.

With traditional pivot tables, slicers do not affect drill-through if they are applied on hidden fields. In other words, if you use a slicer to filter Field1, then you must place Field1 somewhere in your table (row, column, or page) in order for the drill-through to behave as expected.

This is not the case with PowerPivot tables. Drill-through works as expected.

What’s new for Excel 2013 – Personal observations

The customer preview of Office 2013 is finally available, and there is a lot of new things to look at, be it for users or for developers.

Here are a few personal observations based on my first look at the product.

  • PowerPivot for Office 2013  now comes bundled with Excel, making Office On Demand, the evolution of the Click-To-Run technology, much more viable.
  • The PowerPoint add-in is no longer required to build simple data models and pivot tables based on several data tables. Measures, KPIs and some other features will require the add-in to be activated.
  • The message “To use multiple table in your analysis, a new PivotTable needs to be created based on the Data Model.” hints at the fact that old-school pivot tables are not dead yet and still are the default PivotTable type.
  • New table objects in your workbook will automatically be added to your data model.
  • You can create a pivot table based on a simple range and then transform the pivot table into a “PowerPivot model”. The initial range will not be transformed into a table object. Further ranges can be added through the Connections manager. (What for?)
  • The data model can now be accessed from VBA. The corresponding object is called Model (not DataModel), despite what the MSDN documentation mentions about it.
  • The PowerPivot add-in will not recognize tables that were added directly to the data model from Excel as linked tables. However, a new connection will be created in Excel, so that refactoring can be done from the Connections manager. A corresponding object called WorksheetDataConnection is available in VBA.
  • You can create two worksheet data connections for the same range. Duplicating ranges for scenarios where the same data must be used twice (for example, when a dimension must play different roles) is no longer required.
  • The function FILTERXML allows you to query an XML document with XPath. The WEBSERVICE function returns an XML text from a web service. ENCODEURL is a helper function to encode a string into a valid URL. All three functions are available in VBA through the WorksheetFunction object.
  • The function FILTERXML returns an array. You can use it in array formulas, or in conjunction with the INDEX function, for example.
  • New functions like SHEET, and SHEETS are available, although it is still unclear to me in which scenarios they will be useful, since no specific function takes a sheet index as a parameter.
  • ISFORMULA and FORMULATEXT are also new in Excel 2013.
  • Examples in the help system are provided as embedded Excel Web App workbooks. This may solve some translation issues that have occurred in the past for non-English Office versions.
  • Internet Explorer is used to navigate the VBA help. You can now alt-tab between the VBA environment and the documentation.
%d bloggers like this: