A recent email exchange with software consultant Breck Carter regarding the performance of user-defined functions in SQL has prompted me to re-post this article that I originally wrote in 2008. User-defined functions in SQL have their uses; unfortunately their performance characteristics are highly variable because it is very
difficult impossible for a query optimizer to take their execution characteristics into account when deciding on the best access plan for a given query. For example, a change to the join order lower down in an execution plan may completely alter the ordering of rows in an intermediate result. If a user-defined function sits above the join, then the execution characteristics of the UDF may be negatively impacted, particularly if the ordering of the rows negates any benefit of memoization – the practice of caching prior execution results of the UDF for subsequent re-use, rather than naively re-executing the function each time. The re-execution of a user-defined function in SQL is prohibitively expensive due to the creation and tear-down of the UDF’s execution context (essentially the same as when executing a stored procedure), which destroys memory and CPU cache locality when executing the query plan.
Anticipating the performance of such complex queries in a production environment can be exceedingly difficult because the performance of the UDF may be affected not only by the size of the underlying tables but also on the amount of correlation between the values referenced within the function, which impacts the benefit of memoization. Testing such queries over small test databases will typically not alert the developer to the possibility of performance issues in a production setting.
The remainder of this blog entry is copyright by Sybase, Inc., an SAP Company, and first appeared on Glenn Paulley’s Sybase blog (http://iablog.sybase.com/paulley/) on 16 May 2008. It is reprinted here with permission.
Set-level operations do matter
One of the problems that the ad-hoc relational mappings offered by the Hibernate object persistence library causes is the generation of queries that contain subqueries – sometimes in a plentiful number. Subqueries can be very difficult to optimize because it remains a challenging research problem to estimate both their cost and their selectivity. SQL Anywhere contains quite sophisticated query rewrite optimizations for subqueries. However, for most commercial products, including SQL Anywhere, the optimization and execution of subqueries remains problematic.
This week I’ve been doing some problem determination for a customer application, and what I want to convey in this post is that the impact of subquery execution can be pronounced, to the extent that it may affect the scalability of the application.
What I have commonly seen in complex queries from a variety of applications is a tendency to utilize subselects or user-defined functions to provide some measure of abstraction within complex queries. The complex query I was analyzing today – written by hand – is a four-way join of complex views, and the first view is itself a complex view that, even after rewrite optimizations that convert subqueries to joins, contains 48 subqueries. Most, though not all, of the subqueries are of the following form:
1 2 3 4 5
SELECT <SELECT list FROM T>, (SELECT R.X FROM R WHERE R.PK = T.X) FROM T
The subselect is guaranteed to return a single row because the WHERE clause of the subselect covers the primary key columns of table R, and if T.X is NULL the subselect will return a NULL value. These properties mean that the nested query above is equivalent to the outer join query
SELECT <SELECT list FROM T>, R.X FROM T LEFT OUTER JOIN R ON (T.X = R.PK)
To illustrate the performance differences between the constructions, I ran queries similar in construction to the above examples using an actual customer database. In my tests, table T is 635K rows, and table R is a view over a single base table comprising 1.1 million rows. A slight difference from the example above is that table R has a 4-column composite key, hence the search condition in both the subselect and the left outer join’s ON condition contains four equality predicates. To ensure that the query’s constructions were kept intact, but to eliminate client-server transmission costs, I encapsulated the queries above in a derived table, and used aggregate functions in each case to limit the final result set to a single row. I would have used the FETCHTST utility to compare execution times, but I used graphical plans with actual statistics from DBISQL to capture detailed statistics about each execution. The results:
- As a subquery, the request completes in 15.29 seconds. Subquery memoization does not pay off with this request, because the correlation values from T are (almost) unique, and consequently previously memoized results cannot be used for subsequent invocations.
- As an outer join, with a single CPU (thread) the query executes in 7.92 seconds. The reason: the left outer join avoids nested-iteration semantics. If intra-query parallelism is enabled (the default), the query’s execution time falls to 2.16 seconds as the result of using parallel hash outer join.
- Finally, I rewrote the subselect as a PSM user-defined function, another technique often used to encapsulate logic in an SQL statement. The function looks something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13
CREATE FUNCTION dba.foo(IN @x INTEGER, IN @y INTEGER, IN @w INTEGER, IN @z INTEGER) RETURNS CHAR(1) BEGIN DECLARE @lResult CHAR(1); IF ISNULL(@x,-1) = -1 THEN RETURN 'N' END IF; SELECT R.X INTO @lResult FROM R WHERE R.PK1 = @x AND R.PK2 = @y AND R.PK3 = @w AND R.PK4 = @z; RETURN(@lResult) END
and the query becomes
1 2 3
SELECT MAX(dt.a), MAX(dt.y) FROM ( SELECT T.a, foo(Tx, T.y, T.w, T.z ) AS y FROM T ) AS dt
The running time for this query is nearly 6 minutes (320 seconds). The reason? The conditional logic in the function’s preamble prevents inlining into the query proper, and the execution cost includes the construction/deconstruction of the execution context required for the user-defined function and, additionally, the periodic re-optimization of the UDF’s SELECT statement.
The lesson? Set-level operations aren’t critical when dealing with small volumes of data or transactions, but their power can be exploited with larger data volumes. Your mileage, of course, may vary.
Could relational systems execute these constructions more efficiently? Yes, though that degree of sophistication will take significant time to implement. More on that in another post.