Fun facts about Transact-SQL syntax

Magic COUNT

The following query is valid:

SELECT COUNT(*)

It returns 1. Can someone see why?

GROUP BY nothing

The following group by clause is valid:

GROUP BY ()

It groups by “nothing”. A query with such a GROUP BY clause will only compile if it only contains aggregate expressions. As a result, there barely is a point in using it, except maybe if you dynamically build your SQL.

OVER the whole set

If you need aggregates over a complete set, you can use one of the following OVER clauses:

OVER (PARTITION BY 1)

OVER ()

ALL and ANY

ALL and ANY cannot be used on the left-hand side of a scalar comparison.

This query is valid:

1 = ALL( SELECT 1)

This query is not :

ALL( SELECT 1) = 1

DAX Queries in Excel 2013 : using ROLLUP to cover some UNION scenarios

DAX does not offer a UNION function or operator. There are a few scenarios where this would be useful.

The scenario

Consider the following table called Games.

image

This is a very simple table. Basically, this table represents a relation between players, which can be stated as “Player 1 played against Player 2”. For now, I will ignore the meaning of why one player is listed in one column or the other.

In order to analyze this data, we will need a Player dimension, representing the set of all players, regardless of whether they were labeled as Player 1 or Player 2.

If the Games table came from a SQL connection, then we might as well use the following query to get the desired result.

SELECT [Player 1] AS Player FROM Games
UNION
SELECT [Player 2] FROM Games

The table would look like this.

image

(Remember that UNION only returns distinct rows).

A DAX solution

I already mentioned Excel 2013 supports DAX queries,  and the resulting tables can be re-used in the data model. Let us take a look at what we need to achieve the desired result.

First thing we need to care about, is that our result set might have more entries than our Games table. We will need to generate at least one row per distinct value in player 1 and 1 row per distinct value in player 2.

Now, imagine you create a PivotTable based on the Games table and put [Player 1] and [Player 2] on rows. This is what you will get:

image

For every distinct value of [Player 1], Excel will create a group, list all the values for [Player 2] within this group, and will add a row for the sub-total.

Since we cannot use the body range of a pivot table, as a data source, we will do this with a DAX query.

SUMMARIZE and ROLLUP will allow us to mimic this behavior.

The following DAX statement

EVALUATE
SUMMARIZE(
Games
, [Player 1]
, ROLLUP(  [Player 2])
, “Player”
, IF (ISSUBTOTAL(Games[Player 2]),[Player 1] , [Player 2])
)

will return the following table:

image

Note that ROLLUP([Player 2]) instructs PowerPivot to generate a subtotal row for the current value of [Player 1]. We also create a calculated column named Player, that will just take the value of [Player 2] for “regular” rows, and the value of [Player 1] for sub-totals. Also, ISSUBTOTAL allows us to differentiate subtotal rows from rows where [Player 2] is blank.

Finally, we just want to get the unique values in the Player column. For that, we will use, once again, the SUMMARIZE function.

The final query

EVALUATE
SUMMARIZE(
SUMMARIZE(
Games
, [Player 1]
, ROLLUP(  [Player 2])
, “Player”
, IF (ISSUBTOTAL(Games[Player 2]),[Player 1] , [Player 2])
)
,  [Player]
)

DAX Queries in Excel 2013

The “old” drill-through trick that could be used in Office 2010 and PowerPivot v2 no longer works in Excel 2013. You can no longer edit the connection for the resulting table.

Luckily, this is no longer required.

If you want to include a DAX query based on the model in your workbook, you can proceed as follows:

  • Go to Data > Existing Connections (EDIT: from the regular Excel ribbon, not the PowerPivot window.)
  • Select a table
  • When asked how you want to view your data, select “Table”
  • To edit the DAX query, right click on the table, then choose Table > Edit DAX …

If, like me, you get the error message “query could not run or the data model could not be accessed. Query (row, position) cannot reference a base table in the EVALUATE query, because it is undetermined”, check whether your query is correct.

The following query, for example, would run smoothly for the appropriate table.

EVALUATE
SUMMARIZE(Table,
ROLLUP( Table[Category],
Table[Subcategory]),
“Sales”, COUNTA( Table[Subcategory])

)

Note that in this mode, the EN-US locales seem to be in use.

Finally, you can reuse the table in your model, if you wish. I have not tested how this affects the updating of the model, though.

Slicers and Drill-through in Excel.

While I was playing around with Excel 2013, I noticed some inconsistency in the way slicers behave depending on whether they are used to filter a PowerPivot table, or whether they are used to filter a traditional pivot table.

With traditional pivot tables, slicers do not affect drill-through if they are applied on hidden fields. In other words, if you use a slicer to filter Field1, then you must place Field1 somewhere in your table (row, column, or page) in order for the drill-through to behave as expected.

This is not the case with PowerPivot tables. Drill-through works as expected.

What’s new for Excel 2013 – Personal observations

The customer preview of Office 2013 is finally available, and there is a lot of new things to look at, be it for users or for developers.

Here are a few personal observations based on my first look at the product.

  • PowerPivot for Office 2013  now comes bundled with Excel, making Office On Demand, the evolution of the Click-To-Run technology, much more viable.
  • The PowerPoint add-in is no longer required to build simple data models and pivot tables based on several data tables. Measures, KPIs and some other features will require the add-in to be activated.
  • The message “To use multiple table in your analysis, a new PivotTable needs to be created based on the Data Model.” hints at the fact that old-school pivot tables are not dead yet and still are the default PivotTable type.
  • New table objects in your workbook will automatically be added to your data model.
  • You can create a pivot table based on a simple range and then transform the pivot table into a “PowerPivot model”. The initial range will not be transformed into a table object. Further ranges can be added through the Connections manager. (What for?)
  • The data model can now be accessed from VBA. The corresponding object is called Model (not DataModel), despite what the MSDN documentation mentions about it.
  • The PowerPivot add-in will not recognize tables that were added directly to the data model from Excel as linked tables. However, a new connection will be created in Excel, so that refactoring can be done from the Connections manager. A corresponding object called WorksheetDataConnection is available in VBA.
  • You can create two worksheet data connections for the same range. Duplicating ranges for scenarios where the same data must be used twice (for example, when a dimension must play different roles) is no longer required.
  • The function FILTERXML allows you to query an XML document with XPath. The WEBSERVICE function returns an XML text from a web service. ENCODEURL is a helper function to encode a string into a valid URL. All three functions are available in VBA through the WorksheetFunction object.
  • The function FILTERXML returns an array. You can use it in array formulas, or in conjunction with the INDEX function, for example.
  • New functions like SHEET, and SHEETS are available, although it is still unclear to me in which scenarios they will be useful, since no specific function takes a sheet index as a parameter.
  • ISFORMULA and FORMULATEXT are also new in Excel 2013.
  • Examples in the help system are provided as embedded Excel Web App workbooks. This may solve some translation issues that have occurred in the past for non-English Office versions.
  • Internet Explorer is used to navigate the VBA help. You can now alt-tab between the VBA environment and the documentation.

Downloading multiple files from Skydrive

I have just downloaded a few PowerPoint sample files from Skydrive.

To keep track of the downloaded files, I created a folder, as I usually do.

You know what? I did not need to do that.

Skydrive creates a single zip archive containing all the selected files.

A simple but very user-friendly idea. Brilliant!

OVER clause in SQL Server 2012

Changes to the OVER clause

RANGE / ROWS clause

The new ROWS keyword in SQL Server 2012 expression allows us to specify the window frames over which our expressions will be evaluated.

The following ROWS clause, for example, allows us to restrict a calculation to all previous rows in a dataset for the defined partition and the provided ORDER BY clause
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

Note that CURRENT ROW actually means all rows in the current partition that have the same ordering value (as defined by the ORDER BY clause) as the current one.

Self joins are no longer required to calculate running values.

As an example, the following query will return an event history from an event list. I considered a CRM system where cases/incidents/tickets can be created, closed and reopened. The raw event list should be transformed into an event history, displaying for each incident the periods over which it was opened.

WITH EventList AS (
SELECT *
FROM (
VALUES
(1, ‘NEW’, CAST(‘2012-01-04 15:00:00’ AS datetime))
,(1, ‘CLOSE’, CAST(‘2012-01-04 15:37:00’ AS datetime))
,(2, ‘NEW’, CAST(‘2012-01-05 15:00:00’ AS datetime))
,(2, ‘CLOSE’, CAST(‘2012-01-06 15:00:00’ AS datetime))
,(2, ‘REOPEN’, CAST(‘2012-01-07 15:00:00’ AS datetime))
,(2, ‘CLOSE’, CAST(‘2012-01-09 15:00:00’ AS datetime))
,(1, ‘REOPEN’, CAST(‘2012-01-18 15:00:00’ AS datetime))
)
AS DATA(ID, EventCode, EventDate)
)
SELECT
ID
, EventCode
, EventDate As StartDate
, MIN(CASE WHEN EventCode = ‘CLOSE’ THEN EventDate END)
OVER (PARTITION BY ID ORDER BY EventDate ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
AS EndDate
FROM EventList

Nice and simple.

A questionable design decision

Due to a questionable design decision, the behavior of the new extended OVER clause may be counter-intuitive.

Consider the expression:
COUNT(EventDate) OVER (PARTITION BY ID)

This gives you the number of rows for the current ID.

The following expression is now legal in SQL Server 2012 – earlier versions would have thrown an error (Incorrect syntax near ‘order’).
COUNT(EventDate)
OVER (PARTITION BY ID ORDER BY EventDate )

Since COUNT is an additive function – this implies the order of evaluation does not matter – I expected this expression to return exactly the same results. Furthermore, I expected this expression to return exactly the same result independently of the field specified in the ORDER BY clause.

However, if you use such an expression , you will find out SQL Server actually restricts the partition depending on the fields listed in the ORDER BY clause. The MSDN documentation is quite clear about this ‘If ROWS/RANGE is not specified but ORDER BY is specified, RANGE UNBOUNDED PRECEDING AND CURRENT ROW is used as default for window frame.’

If I provide no ORDER BY or ROWS/RANGE clause, SQL Server will assume RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.

Why should I expect a different behavior, when I only explicitly modify the ORDER BY clause?

New functions in SQL Server 2012

A quick look at the new functions in SQL Server 2012.

PARSE

This function allows you to parse a string using an optional culture parameter. It is meant for conversions to date/datetime and numeric types. This is a CLR (.Net) function and it expects culture parameters supported by the .Net framework.

An example taken from the MSDN web page:
SELECT PARSE(‘Monday, 13 December 2010’ AS datetime2 USING ‘en-US’) AS Result

The advantages over the CONVERT function are:

  • a wider array of culture/style parameters
  • an improved readability

I noticed a performance hit the first time I ran this ‘query’. I guess the server had to load the .Net assembly first. I wonder if the assembly will stay in memory for the life-time of the DBE service, or if it may unload now and then. This will not be an issue if you do not store dates as text in your database and do your formatting in client applications, which are good practise, and only use this function to import data in a computer-unfriendly format.

TRY_CONVERT

This function is similar to the CONVERT function but it will return NULL instead of casting an error when conversion failed. I had to import data with a string column containing date information. The time stamps had a US or a German format depending on the settings of the user who had pulled the data. Using TRY_CONVERT and COALESCE, importing the data as a DATETIME would have been a breeze.

TRY_PARSE

This function is to PARSE what TRY_CONVERT is to CONVERT.

DATEFROMPARTS, DATETIME2FROMPARTS, …

DATEFROMPARTS lets you easily define a date.

An example from the MDSN web page:
SELECT DATEFROMPARTS ( 2010, 12, 31 ) AS Result

This looks slightly better than
SELECT CAST(‘20101231’ AS DATE)

However, SELECT DATEFROMPARTS ( year, month, day ) looks much better than
CAST(CAST(year* 10000 +month*100 + day AS VARCHAR(8)) AS DATE)

EOMONTH

This function returns the last day of the month for a given date or for the input date offset by a given number of month
SELECT EOMONTH ( GETDATE() ) AS LastDayOfCurrentMonth
SELECT EOMONTH ( GETDATE() ,1) AS LastDayOfNextMonth

CHOOSE

Returns the item at the specified index from a list of values.

Unless you like to hard code labels rather than create lookup tables, you will probably only use this function for application developpement, when users pick items from dropdowns.

Important stuff: Parameter expressions are evaluated only when they are needed:
SELECT CHOOSE(1, 321213  ,1/0)
SELECT CHOOSE(1, 321213  ,’zer’+ 1)

IIF

In case you are too ‘lazy’ to write a CASE statement. SQL Server will translate the IIF expression into the equivalent CASE statement.

Parameter expressions are only evaluated for the relevant branch:
SELECT IIF(1 = 1, 321213 ,1/0)
SELECT IIF(1 = 1, 321213 ,’zer’+ 1)

This is not the same behavior as in SSRS, where an expression might return an error if an irrelevant branch returns an error.

CONCAT

The + sign in SQL can have different meanings:

  1. when used with two strings, it represents a concatenation operator
  2. when used with two numeric values, it represents an addition operator
  3. when used with a string and a numeric value, it represents an addition operator (!!!)

I will make no further comments on the design decision behind point number 3, but thanks to the CONCAT function you now have a way to tell SQL server to implicitly cast numeric values to alphanumeric values, and not the other way around.

FORMAT

To return a value as a formatted string making use of the capabilities of the .Net framework, although you usually had better do the formatting on the client side, this might come handy.

Tip: avoid posting questions on this function in forums where Celko is active.

LOG

You can now specify a base argument for your logarithmic calculations.

Equality and trailing spaces in SQL

While reading posts on a forum, my attention was caught by a SQL oddity causing equality comparisons on strings with trailing spaces to possibly return unexpected results.

For example,
‘abc’ = ‘abc         ‘
will be considered to be true.

A fellow blogger provides more information about this.

The keyword for this behavior is “padding”.

To keep it simple, before a comparison between two strings is made, the server will add trailing spaces to the shorter one to make sure both strings have the same size. According to the post mentioned above, this is standard compliant.

More information can also be found in this article (MS knowledge base).

ARGMAX and ARGMIN in Excel

A blogger was wondering, whether it was possible to retrieve a reference to the cell where the result of the functions MAX or LARGE was found. With the proper combination of INDEX and MATCH, this is indeed possible.

Basically, the idea is to use the result of the MAX (or MIN or LARGE function) as an input for the standard INDEX-MATCH formula.

Suppose the range containing your data is called data_range.

Then,
third_largest = LARGE(data_range,3) returns the 3rd largest value in data_range.
p = MATCH(third_largest ,data_range,0) returns the relative position within data_range

The following returns a reference to the cell placed on the p-th row of the provided data range:
=INDEX(data_range,p)

If your data was arranged horizontally, you would use
=INDEX(data_range,1,p)
which tells excel to look at the first row and p-th column of the data range.

Put together, if  your data is arranged in a column, this gives:
=INDEX(data_range,MATCH(LARGE(data_range,3),data_range,0))

If you enter this formula on a spreadsheet, it will look like it returns the same value as the simpler LARGE(data_range,3) formula. This only appears to be so. When applied to a range, the INDEX function really returns a reference to a cell.

Practically, this means you can nest this formula in functions that take ranges as arguments. Like OFFSET, ROW, COLUMN, …

For example,
=ROW(INDEX(data_range,MATCH(LARGE(data_range,3),data_range,0)))
returns the absolute row number at which the first occurrence of the value was found.

This technique has its limits of course: it only allows you to retrieve the first cell which meets the given value.

Is it possible to retrieve all values for which a MAX value is reached?

Yes, but it quickly becomes more complex:
= LARGE(
IF(data_range=MAX(data_range),ROW(data_range)),
ROW(
OFFSET($A$1,0,0,
COUNT(
IF(data_range=MAX(data_range),ROW(data_range))
)
)
)
)

This formula, when entered as an array formula, will return all absolute row numbers of the cells that contain the max value.

The IF part checks whether the cells in data_range contain the max value. If this is the case, it will return the absolute row number for the range, else it will return the value FALSE. The ROW-OFFSET part is a trick to dynamically build an array containing the values 1 to N, where N stands for the number of cells containing the max value The LARGE part, in our example, will return an array with the relevant values. Note the large function ignores boolean values (TRUE/FALSE).

Let me note, at this point, that before you write such a formula in your worksheets, you may want to check first whether no other option is available (writing intermediate formulas, UDF in VBA, pivot table, …)

%d bloggers like this: