Recently, Les Hatton of Kingston University in London and Michiel van Genuchten of MTOnyx published an article  in IEEE Computer that summarized software metrics collected for IEEE Software magazine since 2010. The software projects described included:
- Automobile engine control unit (Bosch);
- Mobile phone media player (RealMedia);
- Medical equipment (Philips);
- Flight management system (Honeywell);
- Tokyo railway control system (Hitachi);
- Open-source workflow management system (University of Queensland);
- Auto navigation system (Tom Tom);
- Copier (Fuji/Xerox);
- Bing search engine (Microsoft);
- Oil reservoir simulation system (Shell);
- Higgs-Boson particle discovery software (CERN).
The importance of collecting statistics about these various projects, which range from 20,000 to 9.5 million lines of code, is not only that collecting empirical data from actual production systems is useful, but that all of the projects exhibit interestingly similar growth rates: from a minimum of 11% per annum (for a compound annual growth rate (CAGR) of 1.11) to a maximum of 29% (for a CAGR of 1.29). The median CAGR across all projects was 1.16, which means that, after five years, the size of the project, measured in Source Lines Of Code (SLOC), doubles.
It doesn’t matter that the projects are sized using Source Lines Of Code (SLOC) (rather than something more comparable, such as function points) with all of the uncertainty and interpretation that entails (see reference ). What is tremendously interesting is that the growth rate across all the projects is so strikingly similar – and it matches completely with my 17 years of experience at Sybase/SAP with the SQL Anywhere DBMS. The number of samples contained in the study is too small to be statistically significant; hopefully this will change over the next few years. Nonetheless, one could use the median CAGR of 1.16 as a benchmark estimator for software projects over time, and as a rough estimate can be compared to the actual SLOC growth for a project [1, pp. 69]:
In real life, we would revisit the estimates after one year and use the actual CAGR instead of the median, which would reduce the error over five years. This isn’t far off for software estimation, considering we’re looking at a size estimate without applying any knowledge for the software product at hand. Applying a little bit of product knowledge could significantly improve the estimates. For example, safety-critical systems will generally have a lower CAGR because of stringent regulatory and quality requirements, but we would need to test this hypothesis with more data.
The authors go on to discuss the implications of a CAGR of 1.16 at some length (see references , pp. 71, and ). These include whether or not initiating a software project is in fact a good investment, or whether a vendor can gain enough “mileage”  from the development effort. Notwithstanding these arguments, my own experiences with SQL Anywhere development make me think of other, perhaps more short-term issues, that still require careful consideration by management. Let me provide a few of these:
- As software grows, so must the testing effort. Prior studies by Capers Jones (see reference [4, pp. 384]) indicate that, compared to project size, testing effort grows more than linearly. This has implications not only for testing effort, but for the staffing of the software testing function as well.
- As projects grow, they will require additional development staff to achieve similar levels of productivity and reliability. It is unreasonable to assume that individuals can handle a factor of 2 increase in the code that they are responsible for simply by being “better”. Either tools and techniques must improve significantly, or the number of development staff must increase to match the project’s size as it grows.
- All other things being equal, software performance will dip over time. The reason: simply that as (incremental) improvements are made and the software becomes more sophisticated, existing code paths will get longer, making it more and more difficult to match prior performance with identical workloads. Over time, the only way performance can be improved is through macro-level modifications that embody an entirely different implementation. Every development project must, then, schedule time to develop these new techniques or they will continuously fall into the performance “trap” that automatically is sprung by each new, often minor, enhancement.
 Michiel van Genuchten and Les Hatton (October 2013). Quantifying Software’s Impact. IEEE Computer 46(10), pp. 66-72.
 Michiel van Genuchten and Les Hatton (July/August 2012). Compound Annual Growth Rate for Software. IEEE Software 29(4), pp. 19-21.
 Michiel van Genuchten and Les Hatton (September 2011). Software Mileage. IEEE
Software 28(5), pp. 24-26.
 Kshirasagar Naik and Priyadarshi Tripathy (2008). Software Testing and Quality Assurance: Theory and Practice. John Wiley and Sons. ISBN 978-0-471-78911-6.
 Capers Jones and Olivier Bonsignour (2012). The Economics of Software Quality. Addison-Wesley. ISBN 978-0-13-258220-9.