Using a self-signed SSL certificate with SQL Server

Learning to configure SQL Server to use a self-signed SSL certificate was not really part of my training plan. However, Data Explorer required SSL encryption to connect to SQL Server. On the same day I managed to make it work, the DE team released an update that waives this requirement …* sigh* …

The steps described below are just for development and tests, to connect to a local instance of SQL Server. Best practices for SSL encryption and server security may look different.

Here is a summary of how I did it on my machine (SQL Server 2008, 2012 – Express editions too -Windows 7 Ultimate).

Short version

To configure SQL Server to use a self-signed SSL certificate, you will have to:

  1. Create a self-signed certificate
  2. Set permissions for this certificate
  3. Configure SQL Server to use this certificate

Point number 2 is the one I had most problems with. The SQL Server log had entries like these:

  • “The server could not load the certificate it needs to initiate an SSL connection. It returned the following error: 0x8009030d. Check certificates to make sure they are valid.”
  • “Unable to load user-specified certificate …”
  • “… Unable to initialize SSL support. Cannot find object or property.“

Solution: Grant SQL Server rights to read the private key for the certificate.

All steps are described below.

Create a certificate

With IIS

If you have IIS on your machine:

  • Start IIS Manager
  • Go to Server Certificates
  • Right Click > Create Self-Signed Certificate

With makecert

The process as explained by SQL Server BOL uses the makecert command line utility. Since I could not find makecert , until I had solved my problem, I did not follow BOL. You can find the tool here.

Set permissions for the certificate

Find the service account name for your SQL Server instance

  • Start SQL Server Configuration Manager
  • Go to SQL Server Services
  • Select your instance
  • Right-click Properties
  • Copy the account name for later re-use

Launch the Certificates snap-in

One way to manage your certificates is to

  • Launch MMC (type mmc.exe in the Windows search bar)
  • Go to File > Add / Remove Snap-in …
  • Select Certificates ( when prompted choose Computer Account)

Grant SQL Server rights to read the private key for the certificate.

  • In MMC, go to Certificates (Local computer) > Personal > Certificates
  • The certificate should be listed there. ( If  you created the certificate using makecert, you may have to import it.)
  • Right click > All Tasks > Manage Private Keys
  • Add the service account for your instance of SQL Server
  • Give the service account Read permissions

Configure SQL Server to use this certificate

  • Start SQL Server Configuration Manager
  • Go to SQL Server Network configuration
  • Select your instance
  • Right-click > Properties > Certificate tab
  • Choose the certificate you just created
  • Restart the service

Conclusion

I tried to make this post as detailed as possible, but not too detailed. Any questions, suggestions, or corrections? Use the comments below.

Microsoft Data Explorer–a review

As you are probably aware, Microsoft released a Preview version of the Data Explorer add-In. It is currently available for Excel 2010 and Excel 2013. Hopefully, this will still be true for the release version (and not only for VL customers).

Mashing up data from different sources has never been the great strength of Excel. Sure, it was possible, but implied a mix of queries formulas, and macros that often led to crappy convoluted workbooks. Data Explorer allows users to create data mash-ups from within Excel, using a single interface and a consistent expression language.

The ribbon

The interface is good, superior to the one provided in PowerPivot, or DQS (other MS tools that allow business users to actively contribute to the information system.)

The seamless integration with Excel is a huge advantage over similar tools (say, Google Refine).

Since the alpha version, things have changed substantially. The Excel add-in now comes with a ribbon, which is clean and efficient, as well as custom task panes in the Modern UI (aka Windows 8) style.

image

You first connect to a source, then shape your query right away. Many sources are available, including tables in the current workbook. You can also reference Data Explorer queries in other queries. This means queries can be re-used. You do not have to rewrite the same query every time. The Merge and Append options allow you to do joins and unions.

Data Explorer accepts hierarchical data sources, not just flat ones. In simple terms, a cell in a column can contain a table. In the query editor, you can navigate to a nested table by double-clicking it. The interface provides a consistent experience across all types of sources.

The editor

The editor was revamped and streamlined. It is intuitive and works smoothly.

image

The ribbon is gone, which is not a bad thing. The formula bar allows you to type expressions, using the “M” language that Data Explorer uses (more on that later). At the time, there is no support for an Intellisense-like feature, and no function wizard as you may know it from Excel.

However, you can do many things from the interface, without ever writing an expression: you can filter, group and transform data in columns … by right-clicking the right object and choosing the relevant transformation. The generated expression can be seen (and edited) in the formula bar.

Each transformation you apply creates a new step in your query. You can rearrange your query by moving steps up or down (except the Source step which you cannot rename or move): did you want to summarize your data, and then filter instead of filtering then summarizing? No problem. Just move the relevant step up or down.

This has a few advantages:

  • The workflow is natural. You can edit one transformation at a time.
  • For every step, you get a visual feedback of the transformation you just applied.
  • This allows you to keep your queries tidy and avoid unreadable nested calculations

When you write your own expressions, you can choose between chaining multiple steps or nesting/combining expressions. The only thing you should actually be worried about is whether your decision will make the workflow easier to read.

You can rename queries from the designer, but also rename each step. As far as I know, this is the only way to document steps in a query.

Adding calculated columns to a query might require you to write expressions. In many use cases, this will be easy. However, some common calculations (dates, string manipulations, …) require a call to a specific function. For the moment, due to the state of the documentation, finding the function you need requires extra work.

I hope Microsoft will not forget to improve the documentation, once the product gets released. The basic documentation could afford some formatting. The language specification documents will probably be too dense for most users.

The expression language

Data Explorer uses the “M” language, which is expression-based.

This means this language is about writing expressions (formulas) just like you would in Excel or DAX. From what I could take from the documentation and quick experimentations, the language is actually extremely powerful.

A bunch of people at Microsoft Research, are very much into functional programing: if you liked LINQ and F#, then you will like the M language. (Microsoft once had a project called Oslo which featured a language  called “M”. Is it the same? I could not recognize it.)

The language supports different types of values: primitive values (string, numbers, …) but also lists, records, tables, and functions.

This means an expression can return a list, or records, or tables, or functions. A data explorer query can return a function, or a table of functions. You can apply fold operations on lists, define functions within a query, create functions that returns functions, or accept functions as parameters … The language supports closure, recursion …

  • Return a list of number:
    Source := { 1 , 2, 3 }
  • Return numbers from 1 to 10:
    Source  := { 1 … 10 }
  • Return a record with A, B, C where C is a calculated column :
    Source := [ A=1,   B=2, C = A + B ]
  • Return a function, and apply it to a value:
    Source := (x) => x + 1
    InvokedSource := Source(12)
  • Return a string that displays numbers from 1 to 10, separated by a comma
    Source := Text.RemoveRange(List.Accumulate({ 1 .. 10 } ,””, (state, acc) => state & “, ”  & Text.From(acc)  ) ,  0, 2)

You could create a query that takes a document as a source, parses the document and returns a function!

What you will be able to do with the language is absolutely huge. However, as I said before, most use cases will not require a full command of the language.

Conclusion

Data Explorer is not Excel: even if you can do  a lot without knowing much of the language, you may need to learn new functions.

However, due to an intuitive interface and a powerful expression language, the tool might appeal to a large audience from the slightly advanced Excel user, to the R addict.

I am not certain it will appeal to all types of developers, though:

  • I do not expect every C# developer to be a fan of functional programing
  • for the moment, Data Explorer is only available in Excel

On the other hand, this will not be a problem if Data Explorer queries (transformations) can be published as data services, or re-used within SSIS packages.

This leads to the following concern: The alpha version was very much about publishing your queries to the cloud. However, I could not find any reference to this scenario, whether in the ribbon or in the documentation. I hope this is an oversight on my part and this feature will be supported.

Data Explorer comes as a bunch of .Net dlls and an interface written in HTML/JavaScript. I really hope the guys at Microsoft will document the libraries, and provide an official entry point so that developers can add custom functionalities to the query designer, or build custom apps on top of the engine.

Finally, Data Explorer might even be too good to just be an Excel add-in. In the future, I could very well imagine never using an Excel formula again (at least if queries could be automatically updated upon data entry.) That summarizes how impressed I was with the product.

Equivalent of VLOOKUP in DAX – Part II – using TOPN

In my previous entry, I presented the LOOKUPVALUE function. In this article, I will explain how you can use the TOPN function to mimic the behavior of VLOOKUP when doing “approximate” searches.

Ken Puls’s article presents a typical use case for this pattern. Let me start with briefly sketching his problem and describe how he solves it.

The effective tax rate problem

Ken uses data representing sales at a daily level. His problem is to determine the effective tax rate for a given row, when tax rates change over time.

In his article, he uses meaningful sales data. Since the core of the problem is to find the relevant tax rate for a specific date, and I want to focus on that, my ‘Data’ table looks like this:

Date
06/01/2010
07/01/2010
08/01/2010
03/01/2013
04/01/2013
05/01/2013

Here is the ‘Tax Rates’ table. [Start Date] represents the day when a tax rate became effective.

Start Date Tax Rate
07/01/2010 0,12
04/01/2013 0,05

Note that Gerhard Brueckl’s problem is similar: he presents a way to assign the relevant fiscal period to a date in a dimension table. In his model, fiscal periods are defined by their start date as well.

A solution

Ken Puls’ idea is to add a calculated column to his data. This column will contain the value of [Start Date] for the relevant tax rate. He will then create a relationship between the ‘Data’ table and ‘Tax Rates’ table based on this new column.

His calculated column contains the MAX value of [Start Date] for all rows of ‘Tax Rate’ where ‘Tax Rate’[Start Date] is not greater than the current date in the data table. In other words, he replicates the logic of the following Excel formula:

= VLOOKUP( [Date] , 'Tax Rates', 1, TRUE)

The DAX expression for a calculated column looks like this:

[Tax Rate Start Date] := 
 CALCULATE(  MAX( 'Tax Rates'[Start Date]) 
   , FILTER('Tax Rates', 'Tax Rates'[Start Date] <= 'Data'[Date] )  
 )

It filters the ‘Tax Rates’ table, and calculates the max value of ‘Tax Rates'[Start Date] for the filtered table.

Extending the solution (TOPN)

We can use TOPN to mimic the behaviour of VLOOKUP.

For an approximate search, we will follow this logic:

  • use a FILTER expression to eliminate irrelevant rows (as in the previous formula)
  • use the TOPN function to filter the result and only return the last row in ‘Tax Rates’
  • return the value in the required column for the row we just found.

Here, “the last row”  means the row with the latest [Start Date].

Let us rewrite Ken’s expression using the TOPN function:

[Effective Tax Rate - Start Date] :=
 CALCULATE (  VALUES('Tax Rates'[Start Date])
  , TOPN(1
  , FILTER('Tax Rates', 'Tax Rates'[Start Date] <= 'Data'[Date]  )
   , 'Tax Rates'[Start Date]
    )
 )

The advantage of this pattern is that we can use it to query any column in our lookup table:

[Effective Tax Rate - Rate] :=
 CALCULATE ( VALUES('Tax Rates'[Tax Rate])
  , TOPN(1
   , FILTER('Tax Rates', 'Tax Rates'[Start Date] <= 'Data'[Date]  )
   , 'Tax Rates'[Start Date]
    )
 )

Dealing with ties

There might be a problem with the previous formula: TOPN may return more than 1 row. This will be the case, if there are ties in your data.

In our example, if the ‘Tax Rates’ table has several entries for the same [Start Date], then the VALUES expression might return more than one value. In that case, the expression will fail.

Examples:

Start Date
Tax Rate
Start Date
Tax Rate
07/01/2010 0,12 07/01/2010 0,12
04/01/2013 0,05

04/01/2013 0,05

07/01/2010 0,12 07/01/2010 0,13
The formula still works. The formula returns an error.

This may be totally fine. Most of the time, this is what we want: if a reference table contains contradictory information, we want to notice it.

However, in some cases, you may want your formula to ignore that, and take one of these values anyway.

In that case, you can add a sort expression to the TOPN expression, so as to guarantee that only one row (or value) is returned.

Edit: Although this is what I first wrote below, SAMPLE does not behave like a TOPN with ties. How I came to this conclusion is unclear, as I can no longer find the test data I used. Although, this might seem to be true in this specific case, you should not assume this will work every time.

However, you can wrap the TOPN expression within a SAMPLE expression to only get one row (or more if required).

You may also use the SAMPLE function instead of the TOPN function. What does it do? Basically the same as TOPN except in case of a tie. In that case, SAMPLE will exactly return the required number of rows.

  • Which rows? The engine will decide by itself.
  • Will it always return the same rows? This is not guaranteed.
  • Will the rows returned change on each execution? This is not guaranteed.

As a side-note, if you want to write a query that returns exactly 3 rows from your ‘Data’ table, but want these rows to be randomly drawn between each execution, then you can do the following.

EVALUATE SAMPLE( 3,'Data', RAND() )

Equivalent of VLOOKUP in DAX–Part I

One frequent DAX requirement is to write a formula that behaves like the VLOOKUP function in Excel.

Gehrard Brueckl recently blogged about how to map a date to a fiscal period, when fiscal periods are defined in a separate table, and each period is defined by its start date. Ken Puls recently wrote an article about calculating an effective tax rate with DAX.

In this first post, I will present the LOOKUPVALUE function.

Quick notes about VLOOKUP

VLOOKUP, a classic Excel function, accepts two values (TRUE or FALSE) for its last argument.

Assuming we defined names for Value, Range, i, the formula will look like this:
=VLOOKUP(Value, Range, i, FALSE)

VLOOKUP will look for the first occurrence of Value in the first column of Range. If the value cannot be found, the formula will return #NA, else it will return the value in the i-th column on the same row.

Note that VLOOKUP also works with arrays.

When the last parameter is TRUE (or omitted), then VLOOKUP requires your data to be sorted in ascending order. It will look for the last value in the first column of Range that is not greater than Value, and will return the value in the i-th column on the same row, or #NA if none exists.

More details on VLOOKUP here, there, or everywhere on the internet.

LOOKUPVALUE

Syntax

DAX V2 introduced a LOOKUPVALUE, that has a related behavior.

A notable difference is that LOOKUPVALUE allows you to provide criteria on several columns. In Excel formulas, this would require you to use concatenate columns, or even replace VLOOKUP with some INDEX/MATCH formula.

The syntax looks like this:
= LOOKUPVALUE( Table[OutputColumn], Table[LookupColumn1], “SearchValue1”,  Table[LookupColumn1],  “SearchValue2”)

The syntax looks like the vector form of the LOOKUP function, if you put the order of arguments aside. It actually looks like the syntax of the LOOKUPS function, which does not actually exist. Smile

Basically, you first have to define which column contains your output, then define your first lookup column, then provide the value you are looking for, then define your second lookup column, and so on …

Behavior

LOOKUPVALUE behaves differently from VLOOKUP:

  1. whereas VLOOKUP works with arrays, LOOKUPVALUE does not work with column expressions. In other words, the columns arguments must reference columns that physically exist in your model. Also, remember that each column must belong to the same table.
  2. if several rows in your table match your criteria, then VLOOKUP in Excel will only return the first (or last) match. On the contrary, LOOKUPVALUE in DAX may return an error if several rows match your criteria. This will occur if [OutputColumn] does not contain the same value for all matching rows.
  3. a BLANK value is returned if there is no match.

Also, not surprisingly, you must address columns by names and cannot use indexes to do this.

When to use LOOKUP value?

When your lookup table is a parameter of your model. By that, I mean a static table that you can import in your model.

Obviously, when the behavior described above suits your requirements.

When you want to calculate search values on the fly, or use different search values that come from different tables. In other words, when you cannot use PowerPivot relationships.

Bug in PowerPivot 2012 for Excel 2010

I spotted what I firmly believe to be a bug in PowerPivot. If you already met this problem and/or can reproduce this issue, please support this ticket on MS Connect.

An MS employee might – someday , maybe – respond to my bug report and say this is “by design”. This employee may also advance some pretty good reasons why this is so. However, until then, I cannot conceive of any good reason for this behavior.

Introduction

The data

For the rest of this post, I will use an embedded PowerPivot model based on a linked table. The table is called ‘SomeData’:

Country Product OtherAttribute
USA A Y
FRA B Y
USA C N
FRA C N
UK A N
GER B N
USA A N
FRA B N
FRA D N

What is this bug?

Instead of an elaborate discussion about this bug, let me just start introducing it with a mini-quiz.

Questions

Question 1:

Our model only contains the table ‘SomeData’ presented above. We run the following query against our model.

EVALUATE
CALCULATETABLE(     VALUES(‘SomeData’) ,
SUMMARIZE(
FILTER( ‘SomeData’, [OtherAttribute] = “Y”)
, [Country], [Product]
)
)

What does the query return?

  1. All rows for which Country = “USA” and Product = “A” or Country = “FRA” and Product = “B”.
  2. All rows for which Country = “USA” or Country = “FRA”
  3. All rows for which Product = “A” or Product = “B”.
  4. It depends

Question 2:

Our model only contains the table ‘SomeData’ presented above. We run the following query against our model.

EVALUATE
CALCULATETABLE( ‘SomeData’,
SUMMARIZE(
FILTER( ‘SomeData’, [OtherAttribute] = “Y”)
, [Country], [Product]
)
)

What does the query return?

  1. All rows for which Country = “USA” and Product = “A” or Country = “FRA” and Product = “B”.
  2. All rows for which Country = “USA” or Country = “FRA”
  3. All rows for which Product = “A” or Product = “B”.
  4. It depends

Answers:

Question 1:

1. All rows for which Country = “USA” and Product = “A” or Country = “FRA” and Product = “B”.

Question 2:

4. It depends: The result can be either 2 or 3.

Stating the problem

The only difference between both queries, is the presence (or absence) of the VALUES function in the first argument. This means the expression passed as a first argument to the CALCULATE statement changes the way the 2nd argument is evaluated. This is only supposed to work the other way around.

In the second case, the evaluation of the context is incorrect. PowerPivot will only filter on one column. Which one depends on the order of the columns in the ‘SomeData’ Excel table.

As a result, a query might  have different results depending on how the workbook was built.

A working example (direct filtering) …

Returning an arbitrary set of tuples

Consider the following query:

EVALUATE
SUMMARIZE(
FILTER( ‘SomeData’
, [Country] = “USA” && [Product] = “A”
|| [Country] = “FRA” && [Product] = “B”
)
, [Country], [Product]
)

This query returns the set of (Country, Product) tuples according to my predicates:

SomeData[Country] SomeData[Product]
USA A
FRA B

Filtering data based on a calculated set

If I want to get all rows from ‘SomeData’, where the tuple (Product,  Country) matches one of the tuples in the previous set, I can use the previous table expression as the 2nd argument of a CALCULATETABLE expression. This gives the following query:

EVALUATE
CALCULATETABLE( ‘SomeData’,
SUMMARIZE(
FILTER( ‘SomeData’
, [Country] = “USA” && [Product] = “A”
|| [Country] = “FRA” && [Product] = “B”
)
, [Country], [Product]
)
)

The results:

SomeData[Country] SomeData[Product] SomeData[OtherAttribute]
USA A Y
FRA B Y
USA A N
FRA B N

So far so good.  Everything works as expected.

… that no longer works (cross-filtering)

Returning an arbitrary set of tuples

Instead of explicitly filtering the (Country, Product) tuples, as above, let us now filter our data on the [OtherAttribute] column.

EVALUATE
SUMMARIZE(
FILTER( ‘SomeData’, [OtherAttribute] = “Y”)
, [Country], [Product]
)

As before, this returns:

SomeData[Country] SomeData[Product]
USA A
FRA B

Filtering data based on a calculated set

Let us plug, our new SUMMARIZE expression into our CALCULATETABLE expression.

EVALUATE
CALCULATETABLE( ‘SomeData’,
SUMMARIZE(
FILTER( ‘SomeData’, [OtherAttribute] = “Y”)
, [Country], [Product]
)
)

Unexpectedly, this time, the query returns the following results:

SomeData[Country] SomeData[Product] SomeData[OtherAttribute]
USA A Y
FRA B Y
UK A N
GER B N
USA A N
FRA B N

Note that the engine now returns all rows from ‘SomeData’ for which [Product] is in { “A”, “B” }. This is confirmed by SQL Server Profiler.

Replacing ‘SomeData’ with VALUES(‘SomeData’), however, will return the expected results.

Reordering the columns

Worse, let us just change the order of the columns in our Excel table.

Just drag and drop the [Country] column to the left of the [Product] column, and refresh the PowerPivot model. We have not made any changes to the PowerPivot model itself.

Let us run our previous query again.

The results now become:

Country Product OtherAttribute
USA A Y
FRA B Y
USA C N
FRA C N
USA A N
FRA B N
FRA D N

Which means, the query now only filters by Country. Once again, this is confirmed by SQL Server Profiler.

When a filter context is available

Finally, let us add another CALCULATETABLE expression, so that our query becomes:

EVALUATE
CALCULATETABLE(
CALCULATETABLE( ‘SomeData’,
SUMMARIZE(
FILTER( (‘SomeData’), [OtherAttribute] = “Y”)
, [Country], [Product]
)
)
, ‘SomeData'[Country] = “FRA”)

‘SomeData’ is first filtered by [Country], which only returns the rows for France. These data should then be filtered according to the value in [OtherAttribute], then be summarized by [Country] and [Product]. The one tuple remaining (“FRA”, “B”) should then be used to  filter ‘SomeData’ – overriding our initial filter. This is not the case. Here, the product will only get filtered by [Country].

Once again, replacing ‘SomeData’ in the first argument of the innermost CALCULATETABLE expression with VALUES(‘SomeData’) will produce the expected results.

In that case, however, the result of the query does not seem to depend on the order of the columns in the Excel table.

Measures display the same behavior

Example with COUNTROWS

Do measures display the same behavior? Unfortunately, yes.

Let us consider this measure:

[Fine] :=

    CALCULATE( COUNTROWS( VALUES(‘SomeData’) ),
SUMMARIZE(
FILTER( ‘SomeData’, [OtherAttribute] = “Y”)
, [Country],[Product]
)
)

This measure works as expected:

Fine
A B Total
FRA 2 2
USA 2 2
Total 2 2 4

For the same reasons as in the previous sections, this measure does not:

[Not fine] :=

    CALCULATE( COUNTROWS( ‘SomeData’ ),
SUMMARIZE(
FILTER( ‘SomeData’, [OtherAttribute] = “Y”)
, [Country],[Product]
)
)

The result:

Not fine
A B Total
FRA 2 4
USA 2 3
Total 3 3 6

Once again the grand total may change depending on the order of the columns in the linked table.

Example with SUM

You might think: “Not an issue. I always use VALUES when referencing tables or columns”.

Do  you?

Standard aggregate functions like COUNT, COUNTA, SUM, … only accept a direct reference to a column.

To check this, let us add a new column to our table:

Country Product OtherAttribute CheckColumn
USA A Y 100000000
FRA B Y 20000000
USA C N 3000000
FRA C N 400000
UK A N 50000
GER B N 6000
USA A N 700
FRA B N 80
FRA D N 9

Now, let us observer the same unexpected behavior.

[Sum fails] : =

CALCULATE(   SUM(‘SomeData'[CheckColumn]   ),
SUMMARIZE(
FILTER( ‘SomeData’, [OtherAttribute] = “Y”)
, [Country],[Product]
)
)

The results:

Sum fails
A B Total
FRA 20000080 20400089
USA 100000700 103000700
Total 100050700 20006080 120056780

Note that it should be easy to track which row is in the context for each cell:

For example, the total for FRA is 20400089.  This clearly indicates the last two rows in our previous table (where [CheckColumn] = 80 and 9 ) were included in our table.

Here is one way to make it work as expected.

[Sum works]: =

CALCULATE(
CALCULATE( SUM(‘SomeData'[CheckColumn]   ), VALUES(‘SomeData’)) ,
SUMMARIZE(
FILTER( ‘SomeData’, [OtherAttribute] = “Y”)
, [Country],[Product]
)
)

Sum works
A B Total
FRA 20000080 20000080
USA 100000700 100000700
Total 100000700 20000080 120000780

Attribute relationships and cube space in SSAS

Defining attributes on your dimension will improve processing and querying performance but might also affect the results of your queries, sometimes leading to unexpected results.

This has already been discussed by Jeffrey Wang and other bloggers but I figured a new treatment of this topic might be helpful (at least to me).

Without attribute relationships.

Let us take the Date dimension as an example. It has [Year],[Month], and [Date] attributes and contains data for the years 2011-2013. No attribute relationship has been defined yet.

The cube has a measure called [# Days] which represents the number of days for a given period.

In the following, I will not explicitly mention the dimension when writing tuples. This is for the sake of readability. This is of course not recommended in production.

The following table shows what happens, when  you overwrite the context in derived measures.

 
Year Month [# Days] ( [# Days], [Month].[All]) ( [# Days], [Year].[All]) ( [# Days], [Month].&[2013/01]) ( [# Days], [Year].&[2013])
2013 2013/01 31 365 31 31 31
2013 All 365 365 1096 31 365
All All 1096 1096 1096 31 365
All 2013/01 31 1096 31 31 31

The first two columns describe what members are being queried on rows, the other columns show the result of the corresponding measure definition.

The rule in the case where no relationship has been defined is quite simple:  overwriting an attribute has no impact on unrelated attributes.

Attribute relationships modify cube space

Now, suppose we define a relationship [Month]->[Year] in SSAS. [Month] is the source attribute, and [Year] the related attribute.

This relationship tells SSAS there is a n:1 relationship between [Month] and [Year]. In other words, one month can only belong to one year.

Setting [Month] to a specific member, say [2013/01] in your query, will effectively set [Year] to the matching member (in this case, [2013]).

The table shows the effective context that results from what you specified in your query.

Query Effective
Year Month Year Month
All All All All
All 2013/01 2013 2013/01
2013 All 2013 All
2013 2013/01 2013 2013/01
2012 All 2012 All
2012 2012/02 2012 2012/02

When you specify the tuple ( [Year].[All], [Month].&[2013/01] )  in your query, SSAS will actually evaluate ( [Year].[2013], [Month].&[2013/01] )  even though you explicitly requested [Date].[Year].[All].

This might not seem worth mentioning, but it is. It becomes quite important, when you add further attributes and relationships to the [Month] attribute.

Partial tuples in measures.

Overwriting context in a measure would be simple, if you only ever provided complete tuples. However, providing complete tuples would make measures hard to maintain and read. When you specify a partial tuple in a measure, you only specify a few attributes. SSAS implicitly provides members for the attributes you did not include in your tuple.

Overwriting source attributes.

When  you set an attribute to a specific member (e.g. [2013/01]) , its related attributes are implicitly set to their corresponding member (e.g. [2013]).

The rule when setting an attribute to its [All]member,  is less straightforward, and can be expressed this way:

  1. if the current member is [All], current members of related attributes will be kept
  2. otherwise related attributes will be set to [All]

In the table below, I stressed the case where case #1 applies.

Effective ( [Month].[All]) ( [Month].[2013/01])
Year Month Year Month Year Month
All All All All 2013 2013/01
2013 All 2013 All 2013 2013/01
2013 2013/01 All All 2013 2013/01
2012 All 2012 All 2013 2013/01
2012 2012/02 All All 2013 2013/01

Overwriting related attributes.

Setting a related attribute (e.g.  [Year]) in a measure, while omitting the source attribute, will modify the context by implicitly setting its source attribute  (e.g.  [Month]) to its[All] members.

This simple rule can be visualized in the table below.

Effective ( [Year].[All] ) ( [Year].[2013] )
Year Month Year Month Year Month
All All All All 2013 All
2013 All All All 2013 All
2013 2013/01 All All 2013 All
2012 All All All 2013 All
2012 2012/02 All All 2013 All

Context propagation.

Although it might look so at first sight, writing ( [Year].[All] ) in a measure is not the same as writing ( [Year].[All], [Month].[All] ).

Suppose we add new attributes to our date dimension ([Month In Year] and [Day In Month]), and add the appropriate relationships.

Our attribute relationships now look like this:

  • [Date] –> [Day In Month]
  • [Date] –> [Month] –> [Year]
  • [Date] –> [Month] –> [Month In Year]

Explicitly overwriting an attribute in a measure implicitly overwrite its source and related attributes. Implicitly overwriting an attribute, however, only implicitly overwrites its source attributes.

So, if we write ( [Year].[All] ) in a measure, this will not overwrite [Month In Year], or [Day In Month].

The table below summarizes what we obtain for different tuples:

Tuple Year Month Month In Year Date Day In Month
( [Year].[All] ) All All Current member All Current member
( [Year].[All],  [Month].[All] ) All All All All Current member
( [Year].[All], [Month].[All], [Date].[All] ) All All All All All

Also, I mentioned earlier relationship modify cube space.  Suppose you have the following query.

WITH MEMBER [Measures].[My Measure] AS
    (
        [Measures].[# Days], 
            [Date].[Year].[2012]
    )
SELECT  
  [My Measure] ON COLUMNS
 , ([Date].[Month].[2013/01]) ON ROWS 
FROM [My Cube]

The result will be 31.

The reason behind this, is that writing ([Date].[Month].[2013/01]) on rows, is the same as writing:

( [Year].[2013], [Month].[2013/01], [Month In Year].[1], [Date].[All], [Day In Month].[All]  )

Since overwriting [Year] will not overwrite [Month In Year], the measure [# Days] will be calculated for:

( [Year].[2012], [Month].[All],     [Month In Year].[1], [Date].[All], [Day In Month].[All]  )

Conclusion

When overwriting attributes in a measure, you had better not rely too much on implicit members. Rather use the ROOT function wherever appropriate. Check your relationship tree. Check your measures against every sibling attribute in the relationship tree (or other descendants of ascendant nodes).

If you want to override any filter on the [Date] level, including any filter through [Day In Week] or other related attributes then include the [Date].[Date].[All] member in your measure.

If you want to override the filter on the [Month] attribute, as any filter on the [Date] attribute, but keep current filters on [Year] and [Month In Year], then use:

( [Year].CURRENTMEMBER, [Date].[All] )

Notes on “Usage of dates argument in a row context” by Marco Russo

Marco Russo recently blogged about the usage of dates arguments in a row context.  The measure he presents as producing “wrong results”, actually produces the results I expected. For this reason, I had a hard time understanding the motivation behind his article and decided to write a follow-up in case anyone else missed his very subtle explanations about context in DAX expression. If you have not done it yet, I recommend you read his entry, before and/or after reading mine.

So, suppose you have a table called OrderDate with a Date column and one row for each day, and define the following measures:

LastDay : = LASTDATE(OrderDate[Date])

NumLastDay := COUNTROWS(
FILTER(OrderDate,
OrderDate[Date] = [LastDay]
)
)

If you evaluate the measure NumLastDay for the month of January, for example, you will get 31 as a result. As already explained by Marco, FILTER iterates the OrderDate table: It creates a row context for every row of the OrderDate table present in the current filter context. The measure LastDay will be evaluated for every single row(day). Hence, the expression OrderDate[Date] = [LastDay] will always return TRUE.

Contrary to Marco, I was expecting this. Also, you can get the same result if you define LastDay as
LastDay := MAX(OrderDate[Date])
That is why I did not immediately get his point, and what was so special about the LASTDATE function.

However, LASTDATE and MAX behave differently when used directly within the FILTER formula:

DirectCall – MAX := COUNTROWS(
FILTER(OrderDate,
OrderDate[Date] = MAX(OrderDate[Date])
)
)

will return 1 when evaluated for the month of January. The evaluation of OrderDate[Date] within the MAX expression is done for the current filter context, disregarding the row context created by the FILTER expression.

On the contrary,

DirectCall – LASTDATE := COUNTROWS(
FILTER(OrderDate,
OrderDate[Date] = LASTDATE(OrderDate[Date])
)
)

will return 31 when evaluated for the month of January.

The filter expression
OrderDate[Date] = LASTDATE(OrderDate[Date])
actually behaves like the following expressions
OrderDate[Date] = CALCULATE(MAX(OrderDate[Date]))
OrderDate[Date] = CALCULATE(MAXX(OrderDate,[Date]))
OrderDate[Date] = MAXX(CALCULATETABLE(OrderDate),[Date]))
OrderDate[Date] = CALCULATE(LASTDATE(OrderDate[Date]))
OrderDate[Date] = LASTDATE(CALCULATETABLE(OrderDate[Date]))

where the CALCULATE and CALCULATETABLE parts create a new filter context from the current row context, for the evaluation of OrderDate[Date].

That is the point Marco Russo makes in his article about the LASTDATE function. He presents two solutions to calculate a measure for the last day within the current filter context (see his article).

Here is mine, by the way

=CALCULATE(
COUNTROWS(OrderDate),
LASTDATE(OrderDate[Date])
)

Follow

Get every new post delivered to your Inbox.

Join 25 other followers

%d bloggers like this: