web analytics

Data, data everywhere 

Twitter LinkedIn RSS

Register now for ONCWIC 2014, October 24-25 in Guelph

The 2014 Ontario Celebration for Women in Computing Conference runs October 24-25 at the Science Complex Atrium at the University of Guelph, and the two-day event has lots to offer college and university computing students in Ontario.

Scheduled to speak at the conference are:

  • The Honorable Liz Sandals, Minister of Education, Province of Ontario
  • Bonnie Schmidt, Founder and President, Let’s Talk Science
  • Trina Alexson, Director, Advanced Services, CISCO Systems
  • Kelley Irwin, Vice President, Technology Solutions, Toronto-Dominion Bank
  • Kelly Ryan, Director of Development, IBM Canada

Included in the 2014 Conference programme are networking opportunities, a job/career fair, and interview and resume writing workshops specifically designed for women in Computer Science and Computer Engineering programs, particularly students.

Students from Conestoga College can attend ONCWIC 2014 for only $40, which includes the banquet on the Friday evening and meals and refreshments all day Saturday. Students who register for the conference will also have access to shared overnight accommodation in Guelph on the Friday evening.

For questions or further details, please contact one of the Conference organizers or my friend Wendy Powley of Queen’s University.

 Share on Facebook Share on Twitter Share on Reddit Share on LinkedIn
No Comments  comments 

SHARCNET@Conestoga seminar series: Data Mining

The SHARCNET@Conestoga Seminar Series, which is organized by my colleague Dalibor Dvorski, is pleased to announce its first presentation of the year. Conestoga IT is pleased to welcome Dr. Ilias S. Kotsireas, Professor, Department of Physics & Computer Science at Wilfrid Laurier University, and Chair, Special Interest Group on Symbolic and Algebraic Manipulation, Association for Computing Machinery who will speak on “Concepts and Algorithms in Data Mining”.

The talk is scheduled for Monday, October 6, from 17:00 to 18:00 in Doon 2A301.

Concepts and Algorithms in Data Mining

Data mining has exhibited a pervasive and continuing impact in computer science and other disciplines in the past several years. This first talk in the SHARCNET@Conestoga Seminar Series will introduce, in detail, some of the most popular concepts in data mining, such as association rules and decision trees, and will illustrate them with real-life examples. A discussion on algorithms that allow us to compute association rules and decision trees efficiently is also included. The talk will be self-contained, does not require any previous knowledge of data mining, and all students and faculty are invited to attend.


Dr. Ilias S. Kotsireas is a Professor of Computer Science in the Department of Physics and Computer Science at Wilfrid Laurier University. He has published over 100 journal and refereed conference papers, technical reports, books, and special issues of journals in areas of computational algebra, dynamical systems, high-performance computing, and combinatorial design theory. He is also Chair of Association for Computing Machinery’s Special Interest Group on Symbolic and Algebraic Manipulation. His research is and has been continuously funded by the Natural Sciences and Engineering Research Council of Canada (NSERC) for the past 15 years.

For more information on this talk or the SHARCNET@Conestoga Seminar Series, contact Dalibor Dvorski at ddvorski@conestogac.on.ca.

For more information on SHARCNET, see https://www.sharcnet.ca/my/about.

 Share on Facebook Share on Twitter Share on Reddit Share on LinkedIn
No Comments  comments 

DementiaHack 2014

The British Consulate-General in Toronto, along with HackerNest, present DementiaHack 2014, a three-day hackathon in Toronto being held from Friday, September 12, 2014 at 8:00 AM to Sunday, September 14, 2014 at 1:30 PM (EDT).

DementiaHack 2104 brings together the brightest minds in dementia management, patient care, and healthcare technology to tackle the most pressing challenges faced by caregivers and people with dementia.

The government of the United Kingdom is committed to developing the hardware and software prototypes that emerge into tangible, lasting solutions that dementia patients/caregivers will actually benefit from. In addition to cool take-home prizes, winners will go on a UK/Canada roadshow to demo their hacks to major organizations and healthcare conferences.

The competition is open to teams of at most 5 people, and Conestoga College students are encouraged to participate. DementiaHack 2014 is being held at The Digital Media Zone at Ryerson University, 10 Dundas Street East, 6th Floor, Toronto, ON M5B 2G9. Additional details can be found on the DementiaHack 2014 website.

 Share on Facebook Share on Twitter Share on Reddit Share on LinkedIn
No Comments  comments 

The peril of complexity

In June 2011 I gave the keynote talk at the 2011 DBTEST Workshop in Athens, Greece, which was co-located with the 2011 ACM SIGMOD conference.

Here, I have re-posted the slides of my talk, entitled The Peril of Complexity. In the talk, I outline some of the software engineering issues faced by relational database vendors with respect to performance analysis and software testing, which is becoming increasingly more difficult as the complexity of a database system’s software stack continues to increase.

In the talk I also include a section that describes the expansion of the ISO SQL standard. Many practitioners fail to realize just how broad the latest versions of the SQL Standard have become. The SQL/2011 standard includes (just to name a few):

  • Stored procedures and functions
  • Structured types and arrays
  • Entity-type hierarchy support
  • Table functions (SELECT over a stored procedure)
  • Embedded XML and Java support
  • Regular expressions – both in predicates and in functions
  • WINDOW operators and OLAP functions, including linear regression
  • Recursive UNION and common table expressions

and in addition to these SQL language features, commercial products commonly include the following features not included in the standard:

  • Multiple hardware and software platforms, including virtualized servers
  • Intra-query parallelism
  • Hash- and/or range-based data partitioning
  • Multi-version Concurrency Control (MVCC, sometimes called snapshot isolation)
  • Materialized views or join indexes
  • Some form of integrated full text search
  • Clustered and non-clustered indexes, bit-mapped indexes
  • Multidatabase capability
  • Data compression and encryption
  • Some form of distributed scale-out. Examples include: Sybase IQ Multiplex, Microsoft SQL Server, Amazon Data Services (MySQL), Sybase SQL Anywhere
  • user-defined functions, triggers, procedures in virtually any programming language
  • Various APIs including JDBC, ODBC, OLEDB, ADO, ADO.NET, ESQL, LINQ, Entity Framework, PHP, Python, Ruby, Perl, CLR, ODATA

All of the above features have their constituency of users. All of them lead to additional complexity that can impact reliability, development productivity, and perhaps, above all, performance, which often is the most important characteristic in what increasingly is becoming a commodity product.

 Share on Facebook Share on Twitter Share on Reddit Share on LinkedIn
No Comments  comments 

Software and its growth rate

Recently, Les Hatton of Kingston University in London and Michiel van Genuchten of MTOnyx published an article [1] in IEEE Computer that summarized software metrics collected for IEEE Software magazine since 2010. The software projects described included:

  • Automobile engine control unit (Bosch);
  • Mobile phone media player (RealMedia);
  • Medical equipment (Philips);
  • Flight management system (Honeywell);
  • Tokyo railway control system (Hitachi);
  • Open-source workflow management system (University of Queensland);
  • Auto navigation system (Tom Tom);
  • Copier (Fuji/Xerox);
  • Bing search engine (Microsoft);
  • Oil reservoir simulation system (Shell);
  • Higgs-Boson particle discovery software (CERN).

The importance of collecting statistics about these various projects, which range from 20,000 to 9.5 million lines of code, is not only that collecting empirical data from actual production systems is useful, but that all of the projects exhibit interestingly similar growth rates: from a minimum of 11% per annum (for a compound annual growth rate (CAGR) of 1.11) to a maximum of 29% (for a CAGR of 1.29). The median CAGR across all projects was 1.16, which means that, after five years, the size of the project, measured in Source Lines Of Code (SLOC), doubles.

It doesn’t matter that the projects are sized using Source Lines Of Code (SLOC) (rather than something more comparable, such as function points) with all of the uncertainty and interpretation that entails (see reference [5]). What is tremendously interesting is that the growth rate across all the projects is so strikingly similar – and it matches completely with my 17 years of experience at Sybase/SAP with the SQL Anywhere DBMS. The number of samples contained in the study is too small to be statistically significant; hopefully this will change over the next few years. Nonetheless, one could use the median CAGR of 1.16 as a benchmark estimator for software projects over time, and as a rough estimate can be compared to the actual SLOC growth for a project [1, pp. 69]:

In real life, we would revisit the estimates after one year and use the actual CAGR instead of the median, which would reduce the error over five years. This isn’t far off for software estimation, considering we’re looking at a size estimate without applying any knowledge for the software product at hand. Applying a little bit of product knowledge could significantly improve the estimates. For example, safety-critical systems will generally have a lower CAGR because of stringent regulatory and quality requirements, but we would need to test this hypothesis with more data.

The authors go on to discuss the implications of a CAGR of 1.16 at some length (see references [1], pp. 71, and [2]). These include whether or not initiating a software project is in fact a good investment, or whether a vendor can gain enough “mileage” [3] from the development effort. Notwithstanding these arguments, my own experiences with SQL Anywhere development make me think of other, perhaps more short-term issues, that still require careful consideration by management. Let me provide a few of these:

  • As software grows, so must the testing effort. Prior studies by Capers Jones (see reference [4, pp. 384]) indicate that, compared to project size, testing effort grows more than linearly. This has implications not only for testing effort, but for the staffing of the software testing function as well.
  • As projects grow, they will require additional development staff to achieve similar levels of productivity and reliability. It is unreasonable to assume that individuals can handle a factor of 2 increase in the code that they are responsible for simply by being “better”. Either tools and techniques must improve significantly, or the number of development staff must increase to match the project’s size as it grows.
  • All other things being equal, software performance will dip over time. The reason: simply that as (incremental) improvements are made and the software becomes more sophisticated, existing code paths will get longer, making it more and more difficult to match prior performance with identical workloads. Over time, the only way performance can be improved is through macro-level modifications that embody an entirely different implementation. Every development project must, then, schedule time to develop these new techniques or they will continuously fall into the performance “trap” that automatically is sprung by each new, often minor, enhancement.

[1] Michiel van Genuchten and Les Hatton (October 2013). Quantifying Software’s Impact. IEEE Computer 46(10), pp. 66-72.

[2] Michiel van Genuchten and Les Hatton (July/August 2012). Compound Annual Growth Rate for Software. IEEE Software 29(4), pp. 19-21.

[3] Michiel van Genuchten and Les Hatton (September 2011). Software Mileage. IEEE
Software 28(5), pp. 24-26.

[4] Kshirasagar Naik and Priyadarshi Tripathy (2008). Software Testing and Quality Assurance: Theory and Practice. John Wiley and Sons. ISBN 978-0-471-78911-6.

[5] Capers Jones and Olivier Bonsignour (2012). The Economics of Software Quality. Addison-Wesley. ISBN 978-0-13-258220-9.

 Share on Facebook Share on Twitter Share on Reddit Share on LinkedIn
No Comments  comments 

WBDB 2014 submission deadlines are extended

Submission deadlines for the Fifth International Workshop on Big Data Benchmarking, which will be held in Potsdam, Germany at the Hasso Platner Institut from August 5-6, 2014, have been extended by the Workshop’s program committee.

The new deadlines are:

  • May 30, 2014 (6pm PDT): Short versions of papers (4-8 pages) should be submitted by May 30, 2014, using the EasyChair system.
  • June 20, 2014: Authors will be notified about paper acceptance.
  • August 30, 2014: Submission of full length, camera-ready version of papers (8-20 pages). Papers should be submitted using the Springer LNCS proceedings format.
  • Date of the workshop: August 5-6, 2014.

My thanks to workshop organizer Tilmann Rabl of the University of Toronto for the update.

 Share on Facebook Share on Twitter Share on Reddit Share on LinkedIn
No Comments  comments 

Exploiting Text Workshop

On August 7-8, 2014, a text research workshop entitled “Exploiting Text” will be held at the University of Waterloo to celebrate the career and achievements of Professor Frank Wm. Tompa, who officially retired from the University of Waterloo on 1 December 2013.

Workshop Overview

Knowledge is most often captured and communicated through natural language text and preserved as digital documents. Documents are amassed into curated digital libraries or loosely bound into searchable repositories, such as the World Wide Web. With today’s widespread adoption of social media, documents covering all possible topics are created, stored, and shared in increasing numbers by amateurs and hobbyists, as well as by experts and scholars.

For many years, we have been developing tools for searching through document collections, extracting data, and summarizing selected sub-collections. The goal of this workshop is to explore directions in which we can make profitable advances in these areas and in which we can further exploit the text resources available in these collections.

We hope to explore applications that exploit large reference texts (such as the Oxford English Dictionary, the National Library’s Early Canadiana Online, or Wikipedia), extremely large corpora (including the Web or corporate intranets and extranets), open government data (such as Canada’s Open Data initiatives, and humanists’ research needs (such as the Margot project). Topics of interest include:

  • Organization, storage, and management of large reference texts and curated document repositories
  • Search techniques and search engine technology
  • Resource discovery and dissemination
  • Browsing, text mining, information extraction, summarization, and visualization


  • Raymond Ng, University of British Columbia, Vancouver BC
  • Glenn Paulley, Conestoga College, Kitchener ON
  • Ken Salem, University of Waterloo, Waterloo ON
  • Charlie Clarke, University of Waterloo, Waterloo ON
  • David DeHaan, SAP Labs, Waterloo ON
 Share on Facebook Share on Twitter Share on Reddit Share on LinkedIn
No Comments  comments 

Invited talk: Philip Aylesworth, St. Clair College

“An Introduction to Representational State Transfer Application Programming Interface Design”

Philip Aylesworth, St. Clair College

Wednesday, 7 May 2014, 10:00 a.m., DMB-2A301

An application programming interface (API) is an integral part of a modern software application, and representational state transfer (REST) is the current fad in designing network APIs for desktop, Web, and mobile applications. If you are writing server-side code and developing Web APIs, you may not be putting enough thought into their design. Join us to learn about REST and receive an opinionated guide to best practices.

Philip Aylesworth is an instructor at St. Clair College. He teaches Linux and front-end Web development courses in the Internet Applications and Web Development academic program. Philip has enjoyed programming since he first learned how to develop Web applications in the 90s, as a Unix system administrator, at a time when HTML and Perl were the only technologies required. Philip serves as a provincial technical committee member on Skills Ontario’s Web site development competition.

Thanks to Dalibor Dvorski for sending this my way.

 Share on Facebook Share on Twitter Share on Reddit Share on LinkedIn
No Comments  comments 

ONCWIC to be held at the University of Guelph this fall

The fifth annual Ontario Conference for Women in Computing (ONCWIC) is being held at the University of Guelph on October 24-25, 2014. For women considering, or already involved, in a career in Information Technology, ONCWIC offers a unique opportunity to:

  • Expand your professional network in southern Ontario and Canada;
  • Meet IT professionals and subject matter experts;
  • Gain visibility on potential applications of computing;
  • Inspire other women in computing.

Registration is not currently open but registration for university or college students will be inexpensive.

Thanks to my friend Wendy Powley of Queen’s University for sending this my way.

 Share on Facebook Share on Twitter Share on Reddit Share on LinkedIn
No Comments  comments 

Congratulations to diploma program Capstone project winners

Congratulations to the winners of the 2013-14 Capstone project competition for IT diploma programs! Projects were judged on Wednesday, April 23rd by a selection of people from industry, including Jason Hinsperger, Eric Farrar, Dan Farrar, and Wayne Pau from SAP AG.

The winning project teams were:

  • CP/A: Sara Dutton, Corinne Edwards, and Mary Prescott: a web-based scheduling/booking system for a summer camp;
  • CP: Justin Zdolski, Manjiang Yu, and Yan Wang: a web-based flower recognition application;
  • CAD: Bahareh Samavati, Durga Makhija, and Ifeyinwa Ezenyimulu: an appointments application for a beauty salon.

These three winning projects will compete against other best-in-program projects at the annual Conestoga Tech@Work competition and showcase, which will be held on Tuesday, April 29 from 2:00-5:00pm in the Atrium Gallery of the Engineering building at Conestoga’s Cambridge campus. In addition, certificates and cash bursaries will be awarded to the winning students at convocation.

 Share on Facebook Share on Twitter Share on Reddit Share on LinkedIn
No Comments  comments