This week I am here at Schloss Dagstuhl attending the second Dagstuhl workshop, numbered 12321, on the topic of Robust Query Processing. This workshop, organized by Goetz Graefe of HP Labs along with Harumi Kuno, Wey Wuan Guy, and myself, follows a similar workshop held in September 2010 that also looked at robust query processing.
Self-managing database technology, which includes automatic index tuning, automatic database statistics, self-correcting cardinality estimation in query optimization, dynamic resource management, adaptive workload management, and many other approaches, tends to be studied in isolation of other server components. At the September 2010 workshop, participants concentrated on three things:
- determine approaches for evaluating robust query processing technologies in the ‘real’ environment where these independently-developed components would interact;
- establish a metric with which to measure the ‘robustness’ of a database server, making quantitative evaluations feasible so as to compare the worthiness of particular approaches. For example, is dynamic join reordering during query execution worth more than cardinality estimation feedback from query execution to query optimization?
- utilize a metric, or metrics, to permit the construction of regression tests for particular systems.
While the 2010 workshop produced a number of papers [1-3], particularly concerning new benchmark proposals, we were unsuccessful in coming to a consensus on what precisely constituted robust behaviour which made it difficult to make sufficient progress on strategies for improving performance robustness, since consensus was lacking regarding the goals of such changes.
In this 2012 workshop, both previous 10381 attendees and new invitees have been working on ideas specifically related to the transfer of query processing tasks from one “phase” of query processing to another – for example, moving aspects of physical database design to query optimization, or moving aspects of optimization (such as the determination of a join strategy for a particular query) to the query execution phase. Of course many of these ideas have prior art, going back (at least) to the late Gennady Antoshenkov’s seminar papers from the early 1980’s when he worked on DEC RDB. Our focus, however, has been on making query processing changes suitable to improve performance robustness, not (simply) performance per se, and that difference has led to the characterization of a number of interesting approaches and tradeoffs.
This year’s workshop discussions have been vibrant, unsurprisingly so given the talented people assembled at this year’s workshop: Ken Salem and Ihab Ilyas of the University of Waterloo, Christoph Freytag from Berlin’s Humboldt University, Surajit Chaudhuri from Microsoft Research, Peter Boncz, Stratos Idreos, Stefan Manegold and Martin Kersten from Amsterdam’s CWI, as well as product engineers such as Andrew Lamb from Vertica, Campbell Fraser of Microsoft, Miekel Poess and Allison Lee of Oracle, and Ani Nica of SAP. I fully expect some of the ideas discussed this week will appear in the literature within the next 12-18 months.
 Zhongxian Gu, Mohamed Soliman, and Florian Waas (June 2012). Testing the Accuracy of Query Optimizers. In Proceedings of the ACM 2012 DBTEST Workshop, Scottsdale, Arizona.
 Rick Cole, Florian Funke, Leo Giakoumakis, Wey Guy, Alfons Kemper, Stefan Krompß, Harumi Kuno et al. (June 2011). The Mixed-workload CH BenCHmark. In Proceedings of the 2011 ACM DBTEST Workshop, Athens, Greece.
 Martin Kersten, Alfons Kemper, Volker Markl, Anisoara Nica, Meikel Poess, and Kai-Uwe Sattler (June 2011). Tractor Pulling on a Data Warehouse. In Proceedings of the 2011 ACM DBTEST Workshop, Athens, Greece.