Notes on “Usage of dates argument in a row context” by Marco Russo

Marco Russo recently blogged about the usage of dates arguments in a row context.  The measure he presents as producing “wrong results”, actually produces the results I expected. For this reason, I had a hard time understanding the motivation behind his article and decided to write a follow-up in case anyone else missed his very subtle explanations about context in DAX expression. If you have not done it yet, I recommend you read his entry, before and/or after reading mine.

So, suppose you have a table called OrderDate with a Date column and one row for each day, and define the following measures:

LastDay : = LASTDATE(OrderDate[Date])

NumLastDay := COUNTROWS(
FILTER(OrderDate,
OrderDate[Date] = [LastDay]
)
)

If you evaluate the measure NumLastDay for the month of January, for example, you will get 31 as a result. As already explained by Marco, FILTER iterates the OrderDate table: It creates a row context for every row of the OrderDate table present in the current filter context. The measure LastDay will be evaluated for every single row(day). Hence, the expression OrderDate[Date] = [LastDay] will always return TRUE.

Contrary to Marco, I was expecting this. Also, you can get the same result if you define LastDay as
LastDay := MAX(OrderDate[Date])
That is why I did not immediately get his point, and what was so special about the LASTDATE function.

However, LASTDATE and MAX behave differently when used directly within the FILTER formula:

DirectCall – MAX := COUNTROWS(
FILTER(OrderDate,
OrderDate[Date] = MAX(OrderDate[Date])
)
)

will return 1 when evaluated for the month of January. The evaluation of OrderDate[Date] within the MAX expression is done for the current filter context, disregarding the row context created by the FILTER expression.

On the contrary,

DirectCall – LASTDATE := COUNTROWS(
FILTER(OrderDate,
OrderDate[Date] = LASTDATE(OrderDate[Date])
)
)

will return 31 when evaluated for the month of January.

The filter expression
OrderDate[Date] = LASTDATE(OrderDate[Date])
actually behaves like the following expressions
OrderDate[Date] = CALCULATE(MAX(OrderDate[Date]))
OrderDate[Date] = CALCULATE(MAXX(OrderDate,[Date]))
OrderDate[Date] = MAXX(CALCULATETABLE(OrderDate),[Date]))
OrderDate[Date] = CALCULATE(LASTDATE(OrderDate[Date]))
OrderDate[Date] = LASTDATE(CALCULATETABLE(OrderDate[Date]))

where the CALCULATE and CALCULATETABLE parts create a new filter context from the current row context, for the evaluation of OrderDate[Date].

That is the point Marco Russo makes in his article about the LASTDATE function. He presents two solutions to calculate a measure for the last day within the current filter context (see his article).

Here is mine, by the way

=CALCULATE(
COUNTROWS(OrderDate),
LASTDATE(OrderDate[Date])
)

Analytic functions in SQL Server 2012–Part II –

For the rest of this article, suppose you have a table describing events, with the following columns:

  • EmployeeID: the ID of the employee who triggered the event
  • ItemID: the ID of the item for which the event was triggered
  • EventDateTime: when the triggered occurred.

FIRST_VALUE / LAST_VALUE

These functions will get the first/ last value within the specified PARTITION. “First” and “last”  will be defined according to the ORDER specified in the OVER clause.

To get the ID of the employee who created an item (first event), you could use the following expression:

FIRST_VALUE(EmployeeID) OVER (PARTITION BY ItemID ORDER BY EventDateTime ASC)

The expression will return the value of EmployeeID for the row that has the lowest EventDateTime, amongst every rows having the same value for ItemID.

LAST_VALUE fits a similar role, but beware of the default window frame:

When no ROWS or RANGE clause has been specified in the OVER clause, RANGE UNBOUNDED PRECEDING AND CURRENT ROW is used.

Does anybody see any use for using LAST_VALUE with this window frame?

LAG / LEAD

LAG and LEAD allow you to get information from the n-th row before or after the current “row”.

The following expression returns the date time of the next event for the current ItemID:

LEAD( EventDateTime, 1, GETDATE() )  OVER (PARTITION BY ItemID ORDER BY EventDateTime)

LAG and LEAD accept no window frame (no implicit ROWS or RANGE clause).

Are these functions deterministic?

When you use these functions, remember the following: if the value you wish to return is not included in the ORDER BY clause, rows with different values may have the same order. In that case, you cannot guarantee which value will be returned.

Consider the following example:

WITH DATA (partitionID, orderingValue, Value) AS (
SELECT 1, 100, ‘VAL1’ UNION ALL
SELECT 1, 100, ‘VAL2’ UNION ALL
SELECT 2, 100, ‘VAL2’ UNION ALL
SELECT 2, 100, ‘VAL1’
)
SELECT
*
INTO #TMP
FROM DATA;

SELECT *
, FIRST_VALUE(Value) OVER (PARTITION BY partitionID ORDER BY orderingValue)
FROM #TMP;

CREATE CLUSTERED INDEX myTempIndex
ON #TMP (partitionID, orderingValue, Value);

SELECT *
, FIRST_VALUE(Value) OVER (PARTITION BY partitionID ORDER BY orderingValue)
FROM #TMP;

DROP INDEX myTempIndex ON #TMP;
DROP TABLE #TMP;

Creating a clustered INDEX on the table changed the outcome of the query.

Even without changes to an INDEX, I suspect a change in an execution plan may also lead to such changes.

Analytic functions in SQL Server 2012 – Part I – Distributions

In this article, I take a look at the analytic functions in SQL Server 2012 to handle frequency distributions.

CUME_DIST

The CUME_DIST function returns the percentage of records that are of same or lower order than the current record.

The expression:
CUME_DIST() OVER(ORDER BY MyValue)
Is equivalent (neglecting precision) to :
1.0 * COUNT(*)
OVER (ORDER BY MyValue
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
/ COUNT(*) OVER ()

Possible scenario: calculate the percentage of households whose income is not greater than the current one.

PERCENT_RANK

The PERCENT_RANK rank is similar to the CUME_DIST function.

The expression:
PERCENT_RANK() OVER(ORDER BY MyValue)

Is equivalent to (neglecting integer division) :

(  RANK() OVER (ORDER BY MyValue) –1      )
/ ( COUNT(*) OVER ()         –1      )

Possible scenario: for each household, calculate the percentage of the other households that earn less than the current one.

PERCENTILE_DISC

Returns the smallest value such that CUME_DIST is equal to or greater than the provided probability.

PERCENTILE_DISC (0.4)
WITHIN GROUP ( ORDER BY MyValue ASC )
OVER()

A few remarks:

  • NULL values are ignored, although this is not the case for the CUME_DIST function;
  • The OVER clause is mandatory, although it may be empty;
  • No ORDER BY is allowed in the OVER clause; there is a specific WITHIN GROUP clause to specify the ordering of the partition.

Possible scenario: what is the income under which 10% of households fall?

PERCENTILE_CONT

This is an interpolated version of PERCENTILE_DISC. It shares the same syntax. The same remarks as above apply.

Fun facts about Transact-SQL syntax

Magic COUNT

The following query is valid:

SELECT COUNT(*)

It returns 1. Can someone see why?

GROUP BY nothing

The following group by clause is valid:

GROUP BY ()

It groups by “nothing”. A query with such a GROUP BY clause will only compile if it only contains aggregate expressions. As a result, there barely is a point in using it, except maybe if you dynamically build your SQL.

OVER the whole set

If you need aggregates over a complete set, you can use one of the following OVER clauses:

OVER (PARTITION BY 1)

OVER ()

ALL and ANY

ALL and ANY cannot be used on the left-hand side of a scalar comparison.

This query is valid:

1 = ALL( SELECT 1)

This query is not :

ALL( SELECT 1) = 1

DAX Queries in Excel 2013 : using ROLLUP to cover some UNION scenarios

DAX does not offer a UNION function or operator. There are a few scenarios where this would be useful.

The scenario

Consider the following table called Games.

image

This is a very simple table. Basically, this table represents a relation between players, which can be stated as “Player 1 played against Player 2”. For now, I will ignore the meaning of why one player is listed in one column or the other.

In order to analyze this data, we will need a Player dimension, representing the set of all players, regardless of whether they were labeled as Player 1 or Player 2.

If the Games table came from a SQL connection, then we might as well use the following query to get the desired result.

SELECT [Player 1] AS Player FROM Games
UNION
SELECT [Player 2] FROM Games

The table would look like this.

image

(Remember that UNION only returns distinct rows).

A DAX solution

I already mentioned Excel 2013 supports DAX queries,  and the resulting tables can be re-used in the data model. Let us take a look at what we need to achieve the desired result.

First thing we need to care about, is that our result set might have more entries than our Games table. We will need to generate at least one row per distinct value in player 1 and 1 row per distinct value in player 2.

Now, imagine you create a PivotTable based on the Games table and put [Player 1] and [Player 2] on rows. This is what you will get:

image

For every distinct value of [Player 1], Excel will create a group, list all the values for [Player 2] within this group, and will add a row for the sub-total.

Since we cannot use the body range of a pivot table, as a data source, we will do this with a DAX query.

SUMMARIZE and ROLLUP will allow us to mimic this behavior.

The following DAX statement

EVALUATE
SUMMARIZE(
Games
, [Player 1]
, ROLLUP(  [Player 2])
, “Player”
, IF (ISSUBTOTAL(Games[Player 2]),[Player 1] , [Player 2])
)

will return the following table:

image

Note that ROLLUP([Player 2]) instructs PowerPivot to generate a subtotal row for the current value of [Player 1]. We also create a calculated column named Player, that will just take the value of [Player 2] for “regular” rows, and the value of [Player 1] for sub-totals. Also, ISSUBTOTAL allows us to differentiate subtotal rows from rows where [Player 2] is blank.

Finally, we just want to get the unique values in the Player column. For that, we will use, once again, the SUMMARIZE function.

The final query

EVALUATE
SUMMARIZE(
SUMMARIZE(
Games
, [Player 1]
, ROLLUP(  [Player 2])
, “Player”
, IF (ISSUBTOTAL(Games[Player 2]),[Player 1] , [Player 2])
)
,  [Player]
)

DAX Queries in Excel 2013

The “old” drill-through trick that could be used in Office 2010 and PowerPivot v2 no longer works in Excel 2013. You can no longer edit the connection for the resulting table.

Luckily, this is no longer required.

If you want to include a DAX query based on the model in your workbook, you can proceed as follows:

  • Go to Data > Existing Connections (EDIT: from the regular Excel ribbon, not the PowerPivot window.)
  • Select a table
  • When asked how you want to view your data, select “Table”
  • To edit the DAX query, right click on the table, then choose Table > Edit DAX …

If, like me, you get the error message “query could not run or the data model could not be accessed. Query (row, position) cannot reference a base table in the EVALUATE query, because it is undetermined”, check whether your query is correct.

The following query, for example, would run smoothly for the appropriate table.

EVALUATE
SUMMARIZE(Table,
ROLLUP( Table[Category],
Table[Subcategory]),
“Sales”, COUNTA( Table[Subcategory])

)

Note that in this mode, the EN-US locales seem to be in use.

Finally, you can reuse the table in your model, if you wish. I have not tested how this affects the updating of the model, though.

Slicers and Drill-through in Excel.

While I was playing around with Excel 2013, I noticed some inconsistency in the way slicers behave depending on whether they are used to filter a PowerPivot table, or whether they are used to filter a traditional pivot table.

With traditional pivot tables, slicers do not affect drill-through if they are applied on hidden fields. In other words, if you use a slicer to filter Field1, then you must place Field1 somewhere in your table (row, column, or page) in order for the drill-through to behave as expected.

This is not the case with PowerPivot tables. Drill-through works as expected.

%d bloggers like this: