Query Data – using Table Expressions

Specifies a temporary named result set, known as a common table expression (CTE). This is derived from a simple query and defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE or MERGE statement. This clause can also be used in a CREATE VIEW statement as part of its defining SELECT statement. A common table expression can include references to itself. This is referred to as a recursive common table expression.

[ WITH <common_table_expression> [ ,…n ] ] <common_table_expression>::=

expression_name [ ( column_name [ ,…n ] ) ] AS ( CTE_query_definition )

Guidelines for Creating and Using Common Table Expressions

The following guidelines apply to nonrecursive common table expressions. For guidelines that apply to recursive common table expressions, see Guidelines for Defining and Using Recursive Common Table Expressions that follows.

A CTE must be followed by a single SELECT, INSERT, UPDATE, or DELETE statement that references some or all the CTE columns. A CTE can also be specified in a CREATE VIEW statement as part of the defining SELECT statement of the view.
Multiple CTE query definitions can be defined in a nonrecursive CTE. The definitions must be combined by one of these set operators: UNION ALL, UNION, INTERSECT, or EXCEPT.
A CTE can reference itself and previously defined CTEs in the same WITH clause. Forward referencing is not allowed.
Specifying more than one WITH clause in a CTE is not allowed. For example, if a CTE_query_definition contains a subquery, that subquery cannot contain a nested WITH clause that defines another CTE.

The following clauses cannot be used in the CTE_query_definition:

ORDER BY (except when a TOP clause is specified)
INTO
OPTION clause with query hints
FOR BROWSE

When a CTE is used in a statement that is part of a batch, the statement before it must be followed by a semicolon.
A query referencing a CTE can be used to define a cursor.
Tables on remote servers can be referenced in the CTE.
When executing a CTE, any hints that reference a CTE may conflict with other hints that are discovered when the CTE accesses its underlying tables, in the same manner as hints that reference views in queries. When this occurs, the query returns an error.

Guidelines for Defining and Using Recursive Common Table Expressions

The following guidelines apply to defining a recursive common table expression:

The recursive CTE definition must contain at least two CTE query definitions, an anchor member and a recursive member. Multiple anchor members and recursive members can be defined; however, all anchor member query definitions must be put before the first recursive member definition. All CTE query definitions are anchor members unless they reference the CTE itself.
Anchor members must be combined by one of these set operators: UNION ALL, UNION, INTERSECT, or EXCEPT. UNION ALL is the only set operator allowed between the last anchor member and first recursive member, and when combining multiple recursive members.
The number of columns in the anchor and recursive members must be the same.
The data type of a column in the recursive member must be the same as the data type of the corresponding column in the anchor member.
The FROM clause of a recursive member must refer only one time to the CTE expression_name.

The following items are not allowed in the CTE_query_definition of a recursive member:

SELECT DISTINCT
GROUP BY
PIVOT (When the database compatibility level is 110 or higher. See Breaking Changes to Database Engine Features in SQL Server 2016.)
HAVING
Scalar aggregation
TOP
LEFT, RIGHT, OUTER JOIN (INNER JOIN is allowed)
Subqueries
A hint applied to a recursive reference to a CTE inside a CTE_query_definition.

The following guidelines apply to using a recursive common table expression:-

All columns returned by the recursive CTE are nullable regardless of the nullability of the columns returned by the participating SELECT statements.
An incorrectly composed recursive CTE may cause an infinite loop. For example, if the recursive member query definition returns the same values for both the parent and child columns, an infinite loop is created. To prevent an infinite loop, you can limit the number of recursion levels allowed for a particular statement by using the MAXRECURSION hint and a value between 0 and 32,767 in the OPTION clause of the INSERT, UPDATE, DELETE, or SELECT statement. This lets you control the execution of the statement until you resolve the code problem that is creating the loop. The server-wide default is 100. When 0 is specified, no limit is applied. Only one MAXRECURSION value can be specified per statement. For more information, see Query Hints (Transact-SQL).
A view that contains a recursive common table expression cannot be used to update data.
Cursors may be defined on queries using CTEs. The CTE is the select_statement argument that defines the result set of the cursor. Only fast forward-only and static (snapshot) cursors are allowed for recursive CTEs. If another cursor type is specified in a recursive CTE, the cursor type is converted to static.
Tables on remote servers may be referenced in the CTE. If the remote server is referenced in the recursive member of the CTE, a spool is created for each remote table so the tables can be repeatedly accessed locally. If it is a CTE query, Index Spool/Lazy Spools is displayed in the query plan and will have the additional WITH STACK predicate. This is one way to confirm proper recursion.
Analytic and aggregate functions in the recursive part of the CTE are applied to the set for the current recursion level and not to the set for the CTE. Functions like ROW_NUMBER operate only on the subset of data passed to them by the current recursion level and not the entire set of data passed to the recursive part of the CTE. For more information, see example K. Using analytical functions in a recursive CTE that follows.

Features and Limitations of Common Table Expressions in SQL Data Warehouse and Parallel Data Warehouse

The current implementation of CTEs in SQL Data Warehouse and Parallel Data Warehouse have the following features and limitations:

A CTE can be specified in a SELECT statement.
A CTE can be specified in a CREATE VIEW statement.
A CTE can be specified in a CREATE TABLE AS SELECT (CTAS) statement.
A CTE can be specified in a CREATE REMOTE TABLE AS SELECT (CRTAS) statement.
A CTE can be specified in a CREATE EXTERNAL TABLE AS SELECT (CETAS) statement.
A remote table can be referenced from a CTE.
An external table can be referenced from a CTE.
Multiple CTE query definitions can be defined in a CTE.
A CTE must be followed by a single SELECT statement. INSERT, UPDATE, DELETE, and MERGE statements are not supported.
A common table expression that includes references to itself (a recursive common table expression) is not supported.
Specifying more than one WITH clause in a CTE is not allowed. For example, if a CTE query definition contains a subquery, that subquery cannot contain a nested WITH clause that defines another CTE.
An ORDER BY clause cannot be used in the CTE_query_definition, except when a TOP clause is specified.
When a CTE is used in a statement that is part of a batch, the statement before it must be followed by a semicolon.
When used in statements prepared by sp_prepare, CTEs will behave the same way as other SELECT statements in PDW. However, if CTEs are used as part of CETAS prepared by sp_prepare, the behavior can defer from SQL Server and other PDW statements because of the way binding is implemented for sp_prepare. If SELECT that references CTE is using a wrong column that does not exist in CTE, the sp_prepare will pass without detecting the error, but the error will be thrown during sp_execute instead.

Recursive Queries using Common Table Expressions (CTE) in SQL Server

Common Table Expression (CTE) was introduced in SQL Server 2005 and can be thought of as a temporary result set that is defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement. You can think of CTE as an improved version of derived tables that more closely resemble a non-persistent type of view. Look at CTEs as your derived tables in SQL Server 2000. A CTE can be used in many of the same ways you use a derived table. CTEs can also contain references to themselves. This allows the developer to write complex queries simpler. CTEs can also be used in place of views. The use of CTEs provides two main advantages. One is that queries with derived table definitions become more simple and readable. While traditional T-SQL constructs that are used to work with derived tables normally requires a separate definition for the derived data such as a temporary table or a table-valued function, using CTEs make it easier to see the definition of the derived table with the code that uses it. The other thing is that CTEs significantly reduces the amount of code required for a query that traverses recursive hierarchies.

To understand what a CTE is all about, let’s first take a look at the syntax to create it in SQL Server 2005.

Syntax
In general form a recursive CTE has the following syntax:

WITH cte_alias (column_aliases)  
AS  
( 
cte_query_definition   --initialization 
UNION ALL 
cte_query_definition2 --recursive execution 
)  
SELECT * FROM cte_alias

You provide the CTE with an alias and an optional list of aliases for its result columns following the keyword WITH which usually defines the derived table based on the query definition; write the body of the CTE; and refer to it from the outer query.

To put this in the right perspective, let’s come up with a simple example which uses recursion. We’ll look at the Employees table in the Northwind database and see that a particular employee reports to another employee. One question we can come up with is, “Who reports to whom?” The Employees table is designed in such a way that the ReportsTo column is a foreign key field that refers to the primary key field EmployeeID. Thus, we can create a query to answer our question. A sample query using CTE will look something like this.

WITH Managers AS 
( 
--initialization 
SELECT EmployeeID, LastName, ReportsTo  
FROM Employees 
WHERE ReportsTo IS NULL 
UNION ALL 
--recursive execution 
SELECT e.employeeID,e.LastName, e.ReportsTo 
FROM Employees e INNER JOIN Managers m  
ON e.ReportsTo = m.employeeID 
) 
SELECT * FROM Managers

Code Walkthrough

The recursive CTE, Managers, defines an initialization query and a recursive execution query
The initialization query returns the base result and is the highest level in the hierarchy. This is identified by the ReportsTo value of NULL, which means that the particular Employee does not report to anybody. Depending on how the table is designed, the value can be anything as long as it represents the highest level in the hierarchy
The recursive execution query is then joined to the initialization query using the UNION ALL keyword. The result set is based on the direct subordinate as returned by the initialization query, which then appears as the next level in the hierarchy. Recursion occurs because of the query referencing the CTE itself based on the Employee in the Managers CTE as input. The join then returns the employees who have their managers as the previous record returned by the recursive query. The recursive query is repeated until it returns an empty result set.
The final result set is returned by querying the Managers CTE

The sample query contains the elements that a recursive CTE must contain. What’s more is that the code is a lot more readable. This enables the developers to write complex queries with ease.

You can also use a query hint to stop a statement after a defined number of loops. This can stop a CTE from going into an infinite loop on a poorly coded statement. You do this by including the MAXRECURSION keyword in the SELECT query referring to the CTE. To use it in the previous example

SELECT * FROM Managers OPTION (MAXRECURSION 4)

To create a similar yet non-recursive query that produces the same result in SQL Server 2000, you might come up with something similar to this code:

DECLARE @rowsAdded INT 

--table variable to hold accumulated results 
DECLARE @managers TABLE --initialize @managers who do not have managers 
   (EmpID INT, MgrID INT, processed INT DEFAULT(0)) 

INSERT @managers  
SELECT EmployeeID, ReportsTo, 0  
FROM Employees  
WHERE ReportsTo IS NULL 

SET @rowsAdded=@@rowcount  

--do this while new employees are added in the previous iteration  
WHILE @rowsAdded > 0  
BEGIN 

   --mark employee records going to be found in this iteration with --processed=1  
   UPDATE @managers SET processed=1 WHERE processed=0 

   --insert employees who report to employees not yet processed  
   INSERT @managers  
   SELECT EmployeeID, ReportsTo, 0  
   FROM Employees e  
   INNER JOIN @managers r ON e.ReportsTo = r.EmpID  
   WHERE ReportsTo <> EmployeeID AND r.processed = 1  
           
   SET @rowsAdded = @@rowcount  

   --mark employee records found in this iteration as processed  
   UPDATE @managers SET processed=2 WHERE processed=1  

END  

SELECT * FROM @managers

CTE’s vs Temp tables – an Opinion

This is one of those topics that will get people fired up. But here goes.

I am mostly an explicit temp table developer. By contrast, I am not an implicit temp table developer. What I mean by that is – in writing SQL, you simply cannot avoid the use of tempdb. Either you write to tempdb by breaking queries out and intentionally create temp tables, or you write to tempdb by not breaking queries out (keeping them as long convoluted statements with a long convoluted query plan) and let the optimizer create “worktables”. In either case you are writing to tempdb whether you like it or not. Yet.. the difference is this:

Breaking them out:

You can control the size of the result set being written to disk
You can ensure that the execution plan is simple
You can utilize the materialized temp table data throughout the entire procedure
Temp tables contain statistics and can be indexed
To compare temp table development to CTE development is somewhat of an apples and oranges comparison. A CTE uses nothing special on the back end. It is simply a (potentially) clean way to write a query. The difference is this however. With a CTE, the execution plan of the main query becomes intertwined with the CTE, leaving more room for the optimizer to get confused. By contrast, when a temp table divides two queries, the optimizer is not intertwined with that which created the temp table and the execution plans stay simple and decoupled from each other.

There are also situations where you cannot just swap a CTE for a temp table. You want to write as little as possible to disk. So when you break the queries out, you want to do it in a way that limits the result sets as much as possible (while keeping the plan as simple as possible) so you do not write a lot to disk. Keeping that in mind, the same query techniques are not interchangeable.

Long story short, there is a game in SQL Server called “choose your bottleneck”. There is always a bottleneck, which do you want it to be? Do you want to be a slave to the optimizer by hoping it generates a good plan for a complicated query? Or would you rather create simple queries so you don’t have to be woken up at 2am because of 200 waiting tasks because one of the most called procs caught a bad plan? I choose keep the plan simple. Break up the queries. Mitigate the writes by limiting the result set. And get some sleep.