Common queries : Grouping / aggregating over several columns.


I was looking for common SQL queries which could be used as a battery test for business intelligence tools.

I found quite a bunch of examples on the following website http://www.artfulsoftware.com/infotree/queries.php and decided to take a look at them.

Two query examples had the same concern.

1). Suppose you have the following table that summarizes squash court bookings.

image

You want each member of the club to pay half the fee, if he only appears in one column (member1 or member2), or to pay the full price, if he appears in both columns.

2). Suppose you have the following teams and games tables.

image

image

You want to get this output:

image

For each team, you want to count the number of games played (if the team appears in the column team1 or team2 from the games table) and number of games won (if the team appears in the team1 column and score1 > score2 or it appears in the team2 column and score1 < score2 ), lost, drawn and the number of goals scored, …

Writing a SQL query to do this is not hard. On may go through a UNION operation and then summarize the data accordingly, as proposed the site mentioned above, but I find the following query to be closer to the natural formulation of the problem:

SELECT
T.name
, SUM(CASE
WHEN T.id = team1 AND score1 > score2 THEN 1
WHEN T.id = team2 AND score1 < score2 THEN 1
ELSE 0
END) AS WON
, SUM(CASE
WHEN T.id = team1 AND score1 = score2 THEN 1
WHEN T.id = team2 AND score1 = score2 THEN 1
ELSE 0
END) AS DRAW
    … and so on
FROM teams AS T
LEFT JOIN games AS G
ON T.id = G.team1
OR T.id = G.team2
GROUP BY T.name

If you had no separate table for the team, you would have to replace the team table in the previous query with a sub-query. Something along the line :  (SELECT teamname1 FROM games UNION SELECT teamname2 FROM games).

Using Excel, building such a summary table is easy, provided you have the list of team names you are interested in. Unfortunately, there is no easy way to retrieve a list of unique values from both columns out-of-the box. Of course, this can still be done with some VBA to automate the process, but this is not native. Please note the spreadsheet from Google Doc offers the nice UNIQUE function, which enables to get a list of distinct values from a range, but the function will not work for values spread across 2 columns. Pivot tables will not work in that case either, and they do not support many-to-many relationships anyway. This is true for the standard pivot table, this is also true for Powerpivot V1. (V2 will offer some support for overriding a relationship with an expression.)

Now, what I would like to see in a BI system is the ability to support such queries, and as well support for building such queries with some simple drag and drop operation. If you know of a tool with these features, please let me know.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: