web analytics

Data, data everywhere 

Twitter LinkedIn RSS
magnify
formats

How to give a query optimizer fits

Recently I have been interacting with a number of people, including Lukas Eder of JOOQ and consultant/blogger Vlad Mihalcea, on SQL and its merit as a query language, particularly with respect to object-relational mapping frameworks such as Hibernate. With native HQL, or with JPA’s JPQL language, Hibernate is responsible for creating the SQL query passed to the underlying relational database system, rather than directly by the programmer. Sometimes the precise nature of the SQL query can matter, which led me to recall this article that I wrote originally in 2008. I’ve included it below, in its entirety.

This blog entry is copyright by Sybase, Inc., an SAP Company, and first appeared on Glenn Paulley’s Sybase blog (http://iablog.sybase.com/paulley/) on 13 September 2008. It is reprinted here with permission.

How to give a query optimizer fits

I am keenly interested in studying SQL constructions. Obviously it is part of my job to understand and anticipate common SQL constructions, since the SQL Anywhere server has to optimize and execute those statements. But I am also interested in studying these constructions to gain additional insight into how application developers write their applications, and whether or not SQL is succeeding (or failing) as a database sub-language.

Recent interaction with several customers has exposed some interesting tradeoffs related to particular SQL constructions. Let me illustrate what I mean with a real customer example, though transformed so as to execute over the DEMO database supplied as a sample with SQL Anywhere 11:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
CREATE FUNCTION CurrentSalesRep(IN @CustomerID INTEGER, IN @FinancialCode CHAR(2), IN @Region CHAR(20), IN @dFrom DATE, IN @dTo DATE) RETURNS INTEGER
BEGIN
  DECLARE @SalesRep INTEGER;
  SET @SalesRep = NULL;
  IF ISNULL( @CustomerID, 0 ) <> 0 THEN
    SELECT FIRST r.SalesRepresentative INTO @SalesRep 
    FROM SalesOrders r
    WHERE r.CustomerID = @CustomerID AND r.FinancialCode = @FinancialCode
    AND TRIM(ISNULL(r.Region, '')) = TRIM(ISNULL(@Region, ''))
    AND ISNULL(r.OrderDate, '1900-01-01') < = ISNULL(@dTo, CURRENT DATE)
    AND ISNULL(r.OrderDate, '2200-01-01') >= ISNULL(@dFrom, CURRENT DATE)
  END IF;
  RETURN @SalesRep
END

and the function is called with the following parameters:

16
SELECT CurrentSalesRep( 101, 'r1', 'Eastern', '2000-03-01', '2000-04-01') FROM DUMMY

Read more…

 
 Share on Facebook Share on Twitter Share on Reddit Share on LinkedIn
No Comments  comments 
formats

Schema design tools are a tough challenge

Recently I was in Lausanne, Switzerland at EPFL, acting as one of the examiners for Ioannis Alagiannis’ PhD thesis defence. Yannis is a student of Natassa Ailamaki, whose work in database systems I’ve long admired. Yannis’ thesis surrounded issues surrounding NoDB – the idea of having a DBMS perform query processing over raw data files, rather than incur the overhead of loading the data into the database beforehand. Part of Yannis’ thesis is on physical database design tools, work first published at the DBTest Workshop held in Scottsdale, Arizona in 2012. In that paper, Yannis and co-author Renata Borovica demonstrated the difficulty in relying on physical design tools because of inconsistent results. Their work reminded me of this post that I wrote some time ago; here it is again.

This blog entry is copyright by Sybase, Inc., an SAP Company, and first appeared on Glenn Paulley’s Sybase blog (http://iablog.sybase.com/paulley/) on 7 April 2008. It is reprinted here with permission.

Schema design tools are a tough challenge

Materialized views represent an important addition to a DBA’s toolkit. A materialized view is a precomputed view, stored in the database as an ordinary table, and kept up-to-date using either a batch refresh mechanism, or immediately within the same transaction that modifies the view’s underlying base tables. Whether or not immediate modifications to the view can be done depends on the view definition – given an UPDATE, INSERT or DELETE to any particular base table row, the system must be able to compute precisely those rows in the view that are affected by the modification.

The magic behind materialized views is that the query optimizer can substitute one or more materialized views for base tables referenced by an SQL query and do so automatically. View substitution is performed on a cost basis by the query optimizer, a process called view matching. With views that are manually refreshed, the possibility of querying stale data from the view is possible: in such cases, if desired, the relative staleness of the query results can be specified as an option setting or in the query text itself. The option setting to control this in SQL Anywhere is the materialized_view_optimization option.

The problem for a DBA to solve with materialized views is which materialized views to create, a problem that the academic literature calls the view selection problem. In many ways, determining which materialized views to create is similar to deciding which indexes to create. Several of the major commercial systems that support materialized views have included view selection algorithms in their index selection administrative tools. With SQL Anywhere, we have yet to include view selection with the SQL Anywhere Index Consultant, though we have plans to do so in a future release.

What I’d like to illustrate here is a problem with view selection that none of the existing administrative tools consider. Suppose we have transactions A and B that modify tables R and S respectively. It may be, in fact, that A and B do not refer to the same tables at all, and hence there is little or no contention, and no lock contention, between A and B when executing concurrently.

Read more…

 
 Share on Facebook Share on Twitter Share on Reddit Share on LinkedIn
No Comments  comments 
formats

Call for Participation: 6th International Workshop on Big Data Benchmarking

The Sixth International Workshop on Big Data Benchmarking will be held in Toronto, Ontario at the Bahen Centre for Information Technology at the University of Toronto on 16-17 June 2015. Keynote speakers at the two day workshop are Tamer Ozsu of the University of Waterloo and Anil Goel, VP and Chief Architect for the HANA platform at SAP.

The workshop program is available here. Speakers at the Workshop include organizers Chaitan Baru and Tilmann Rabl, along with Ivan Filho, Raghunath Nambiar, Jantz Tran, Todor Ivanov, Zeev Lieber, Marten Rosselli, Jianfeng Zhan, Nicolas Poggi, Josep Berral, and Aaron Call. Of particular interest will be the announcement of the first Big Data Benchmarking Challenge, scheduled for the morning of Day 2.

The WBDB workshops are designed to make progress towards the development of industry-standard benchmarks for evaluating hardware and software solutions for big data applications.

 
 Share on Facebook Share on Twitter Share on Reddit Share on LinkedIn
No Comments  comments 
formats

Borrowed Interest in Custom Mobile Application Development and Design

The next talk in the UX guest lecture series, entitled “Borrowed Interest in Custom Mobile Application Development and Design”, will take place on Monday, June 29 at 5:00 pm in room 2A301 at Conestoga’s Doon campus as we welcome Chris and David Sinkinson of AppArmor.

Abstract

There’s no question that making a mobile app, while difficult, is becoming more and more mainstream. There is an entire generation of young minds who are keen to take on this new marketplace with new and amazingly innovative ideas. In this session, we’ll share our learnings on key design elements which have made our mobile safety platform, AppArmor, successful. In particular, we will blend the worlds of mobile app development and business to give the audience an understanding of what it takes to take their idea from inception to mainstream.

Speakers

David Sinkinson, co-founder and Lead Developer at AppArmor, and President of Marketing and Business Development at CutCom Software.

Chris Sinkinson, co-founder and Chief Executive Officer at AppArmor, and Founder, and President of Product Development and Innovation at CutCom Software.

Chris loves playing with new gadgets. He always sees the potential in new technology and has an incredibly agile business mind. Of all of Chris’ strengths though, his incredible determination to be successful and dedication to AppArmor’s customers put him over the top. He is always willing to do what it takes to get the job done. David, Chris’ younger brother, is the “marketing and sales guy” at AppArmor. With experience in higher education, the telecom industry, and in marketing in general, David brings a unique perspective which is complementary to CutCom Software’s products. David loves a challenge, has an incredible drive and a burning desire to see AppArmor be as successful as possible. He loves getting to know and working with new people. Like Chris, he’s dedicated to his work and has a keen business mind.

Thanks to Dalibor Dvorski for sending this my way.

 
Tags: ,
 Share on Facebook Share on Twitter Share on Reddit Share on LinkedIn
No Comments  comments 
formats

Why the SQL Standard does not need another way to do GROUP BY

In a recent article, Lukas Eder proposed enhancements to the SQL Standard’s GROUP BY syntax to include support for implicit grouping attributes, such as that provided by the Cypher query language. I applaud Lukas’ efforts to define useful changes to the SQL language, but I am going to try to show in this article that the proposed changes are incompatible with the existing SQL standard and commercial SQL implementations. In my examples, I am going to use my leagueDB schema that contains information about the teams and schedule for a hockey league, and is populated with NHL players from eight teams as of February, 2015:

leaguedb_playerstats

Features of grouping and aggregation in SQL

The major issue that Lukas identifies in his article is the semantics of SQL grouped queries when the expressions in the SELECT clause do not match the expressions in the GROUP BY clause, so we consider the basics of grouped queries first. My SQL examples are executed using Oracle 11gR2, but most SQL implementations including DB2 and SQL Anywhere will produce similar results.

GROUP BY fundamentals

Most readers are familiar with the ubiquitous SELECT-FROM-WHERE-GROUP BY but the order of evaluation of the clauses is important, particularly with the addition of PARTITION BY for window functions. The SQL/OLAP windowing extensions allow a user to divide the result set of a query into groups of rows called partitions. Logically, as part of the semantics of computing the result of a query specification, partitions are created after the groups defined by the GROUP BY clause are created, but before the evaluation of the final SELECT list – projection and duplicate elimination – and the query’s ORDER BY clause. Consequently, the order of evaluation of the clauses within an SQL statement becomes:

From → Where → Group by → Having → Window → Distinct → Order by.

Consider a simple query specification containing SELECT-FROM-WHERE-GROUP BY, something like the following example query over the PLAYER table in the leagueDB schema, which will return a result set consisting of the number of players of each country represented in the database that have players taller than 200 cm:

1
2
3
4
SELECT country, COUNT(*)
FROM player
WHERE height > 200 
GROUP BY country;

The (familiar) semantics of this query are as follows:

  1. The FROM clause is evaluated, producing a bag of tuples from the player table.
  2. The resulting rows are filtered by the search condition in the WHERE clause, restricting the bag of tuples to only those players known to have a height greater than 200cm.
  3. The resulting rows are then grouped by distinct values of the country attribute, as per the GROUP BY clause. Once the groups have been formed, the COUNT(*) aggregate function is computed over the rows in each group, yielding a set of result tuples containing pairs of values: the value of the country attribute, and the value of the COUNT(*) aggregate function, the number of rows in each group.
  4. The tuples produced by the GROUP BY clause are then projected using the attribute list in the SELECT clause.

In most commercial database system products, including SQL Anywhere, Oracle, DB2, and Microsoft SQL Server, an attempt to project an attribute from the SELECT clause that is not produced by the grouping operation of the GROUP BY clause will result in an error. For example, suppose we attempt the following query:

5
6
7
8
SELECT country, state_province, COUNT(*)
FROM player
WHERE height > 200 
GROUP BY country;

The inclusion of the state_province attribute in the SELECT list is problematic because that attribute is not produced by the grouping operation – hence each of the above systems will generate a syntax error for this request.

However, MySQL will not generate an error for the statement above. With MySQL, its engine will produce an arbitrary value drawn from the bag of rows combined to form each group. In many cases this value is taken from the first tuple discovered to create a group, but there is no guarantee of determinism.

(Aside: note that if the additional attribute(s) in the SELECT list are formed from subquery expressions, and if any correlation variables in the subqueries are not taken from the set of grouping expressions or aggregate functions produced by the GROUP BY clause, then these queries as well are problematic and will return a syntax error).

The astute reader will realize that if the additional attributes in the projection (the SELECT list) are functionally determined – that is, satisfy a functional dependency – by the grouping expressions, then the result of the query will be deterministic for any instance of the database. If we have

9
10
11
SELECT X, Y, COUNT(*)
FROM T
GROUP BY X;

and the functional dependency X->Y holds, then the query will return a deterministic result, since there will be a single value of Y for any group of rows produced by the GROUP BY clause. This notion is supported in the ISO SQL Standard by directly supporting the computation of derived functional dependencies with each clause in an SQL statement; but few products implement these semantics. Unfortunately, MySQL is too liberal and its non-determinism with GROUP BY semantics is, in my view, quite unfortunate, for two reasons:

  • First, MySQL’s non-determinism is difficult to explain to novice SQL users when explaining the semantics of SQL’s GROUP BY operator. Students struggle when formulating queries because MySQL does not return an error when the query is malformed; MySQL assumes the programmer knows what she is doing, and hence the programmer can fail to realize the non-deterministic nature of their SQL request.
  • Second, discovering these semantic errors is difficult in testing. I have first-hand experience with situations where a MySQL database application passes all of its tests in a test environment, but fails in production due to the non-deterministic evaluation of the additional expressions in the GROUP BY clause, often exposed by different query plans with production data.

What makes MySQL’s non-determinism of these forms of queries truly unfortunate is that, in my experience both with students and with commercial applications, often the inclusion of an extra attribute in a grouped query’s SELECT list is simply an error on the part of the programmer. Unfortunately, unless the user attempts to port the query to another DBMS, or is knowledgeable enough of SQL and their application to discover the error through static analysis, the query will go into a production version of the application with a latent defect.

Derived functional dependencies can be computed from the primary key, foreign key, uniqueness, and CHECK constraints specified for the schema, in addition to analysis of the query’s GROUP BY and WHERE clauses. For example, in this similar query:

12
13
14
15
SELECT country, state_province, COUNT(*)
FROM player
WHERE height > 200 AND country = 'Canada' 
GROUP BY state_province;

the functional dependency {} -> country trivially holds, and by Armstrong’s axioms of functional dependencies we then have state_province -> country, so this query will be deterministic, even in MySQL. However, since most commercial systems fail to analyze FDs as per the SQL 2011 Standard, queries like this will still return a syntax error – in Oracle 11gR2, the above query yields the common “ORA-00979: not a GROUP BY expression” error.

The GROUP BY clause isn’t optional

From the earliest days of SQL, the result of a query involving aggregation but not involving GROUP BY has been clear: the query result consists of a single row containing the results of the aggregate functions in the query’s SELECT list. For example,

16
17
18
SELECT COUNT(*)
FROM player
WHERE height > 200;

yields a single row, and with the instance of the leagueDB database I’m working with, the result of the COUNT(*) aggregate function is 1 – the only player in the database that is taller than 2m is Zdeno Chara, a defenceman with the Boston Bruins.

If the query contains a WHERE clause it is possible that the restriction of the bag of tuples from the FROM clause will be the empty set. In this case:

16
17
18
SELECT COUNT(*), AVG(weight)
FROM player
WHERE height > 200 AND country = 'Canada'

still yields a single row in the result, with the values of 0 and NULL, respectively, for the two aggregate functions, since there are no Canadian players in the database taller than 200cm. The distinction – an arbitrary one, from the beginnings of the SQL language – between COUNT and all other aggregate functions which yield the NULL value when computed over the empty set is known as the ‘COUNT bug’ and has been responsible for a multitude of DBMS query processing bugs in numerous systems over the past 30 years – for example, see references [1-6] below.

However, now consider adding a GROUP BY clause to the query – for example, to count the number of tall players by country, which will return a row in the result for each country:

19
20
21
22
SELECT country, COUNT(*), AVG(weight)
FROM player
WHERE height > 210
GROUP BY country;

This query returns no rows whatsoever – the empty set, as there are no players in the database taller than 210cm (Zdeno Chara is listed at 206cm tall). Note that in this case a single row is not returned. Why? What the GROUP BY operator produces is a bag of tuples, with each tuple corresponding to a group of rows in the input. If the input to GROUP BY is the empty set, then there are no groups, and hence GROUP BY produces nothing. Similarly, a HAVING clause in a grouped query might restrict all of the tuples returned by the GROUP BY clause so that, once again, no tuples generated by the GROUP BY operator will appear in the result.

Now consider the previous example (lines 16-18) but with a GROUP BY clause added:

23
24
25
26
SELECT COUNT(*), AVG(weight)
FROM player
WHERE height > 200 AND country = 'Canada'
GROUP BY ()

On the surface, the addition of the GROUP BY () appears harmless since the original statement was over the entire set of rows in the player table, restricted by the query’s WHERE clause. However, due to the existence of GROUP BY () – syntax originally added when the ISO SQL Standard added support for GROUP BY GROUPING SETS – this query will not return the same result if the input to the GROUP BY clause is empty. This is an important distinction in the semantics of the SQL language – and it makes a difference in how one would write an application program that executes such a query.

The impact of WINDOW

Because window partitioning semantically follows a GROUP BY operator, the result of any aggregate function, such as SUM(), AVG(), and VARIANCE(), is available to the computation done for a partition. Hence a window provides another opportunity to perform grouping and ordering operations in addition to a query’s GROUP BY and ORDER BY clauses. However, any computation involving a window function’s result – for example, using it in a predicate — will require a more complex statement construction. Typically that construction will require a derived table to compute the window function result(s), with the derived table’s outer SELECT block containing the desired WHERE clause predicates. A window’s partition can consist of the entire input, a single row, or something in between. The advantage of a ‘window’ construct is that the rows within the partition can then be sorted to support the additional expressive power provided by window functions.

Discussion

As we have seen, it makes a semantic difference to a query’s result if the query contains a GROUP BY clause or not, and MySQL’s ‘flexibility’ in handling of grouped queries can be argued as both a feature or as a bug, depending on your point of view. One can argue that SQL’s overall semantics are ill-formed: many would argue that the arbitrary distinction between COUNT(*) and other aggregate functions is unfortunate. Moreover, many would argue that GROUP BY‘s treatment of NULL values – as equivalent values for grouping, set operations, and duplicate elimination – is nothing short of bizarre, given the three-valued logic used in the remaining portions of the language. While the ISO-IEC committee has not historically been overly concerned with backwards compatibility or precisely matching the behaviour of existing DBMS implementations, “tinkering” with the syntax of the language given the above is not, I believe, in the best interests of commercial products or customers.

Lukas’ proposed extensions to the SQL language are well-intentioned but they unfortunately conflict with behaviour already specified in the SQL Standard. That said, what I would enjoy seeing is the holistic design of a language that encompasses the range of relational operators described above – projection, duplicate elimination, restriction, grouping, and partitioning, coupled with the treatment of NULL values – and puts the semantics of each on a firmer footing. In my view we as a database community should be looking at better overall replacements for a query language that, despite its visionary and unprecedented success, is now 40 years old. Surely we have learned some things about language design in the past four decades.

References

[1] Werner Kießling (August, 1985). On Semantic Reefs and Efficient Processing of Correlation Queries with Aggregates. In Proceedings of the 11th International Conference on Very Large Data Bases, August 21-23, 1985, Stockholm, Sweden, pp. 241-250.

[2] Günter von Bültzingsloewen (September, 1987). Translating and Optimizing SQL Queries Having Aggregates. In Proceedings of 13th International Conference on Very Large Data Bases, September 1-4, 1987, Brighton, England, pp. 235-243.

[3] Richard A. Ganski and Harry K. T. Wong (May 1987). Optimization of Nested Queries Revisited. In Proceedings of the ACM SIGMOD Conference, San Francisco, California, May 1987, pp. 23-33.

[4] Umeshwar Dayal (August 1987). Of Nests and Trees: A Unified Approach to Processing Queries That Contain Nested Subqueries, Aggregates, and Quantifiers. In Proceedings of 13th International Conference on Very Large Data Bases, September 1-4, 1987, Brighton, England, pp. 197-208.

[5] M. Negri and G. Pelagatti and L. Sbattella (September 1991). Formal Semantics of SQL Queries. ACM Transactions on Database Systems 16(3), pp. 513-534.

[6] M. Muralikrishna (August 1992). Improved Unnesting Algorithms for Join Aggregate SQL Queries. In Proceedings of the 8th International Conference on Very Large Data Bases (VLDB), Vancouver, British Columbia, pp. 91-102.

 
 Share on Facebook Share on Twitter Share on Reddit Share on LinkedIn
3 Comments  comments 
formats

JPA and native SQL queries – part deux

In a previous article I outlined how one can use JPA’s @SqlResultMapping annotation to permit one to utilize JPA’s createNativeQuery() method and map an SQL native query to a result set composed of something other than JPA-managed object instances.

In this follow-on article I’ll comment on a recent JPA extension to the @SqlResultMapping annotation, the @ConstructorResult annotation, which provides a much greater degree of flexibility for Java EE applications. As with my last article, I will be using the LeagueDB database, an example database for a hockey league I constructed with the help of my students in my recent Java Enterprise Applications course, using Java EE 7 with Apache Derby 10.11 as the database server and Hibernate 4.3.6 as the JPA implementation.

In this example, we will create a query that returns the top 10 hockey players with the most points (goals and assists) in the entire league. Player statistics in the leagueDB database are kept in a seperate “playerstats” table that contains their scoring statistics for each season, as illustrated in this portion of the database’s E-R diagram (this from Oracle 11g):

leaguedb_playerstats

Here’s the query to retrieve the top ten players in scoring, all time:

1
2
3
4
5
6
7
8
9
10
11
12
13
@SuppressWarnings("unchecked")
  List<playerstotalpoints> allPlayers = em.createNativeQuery( 
    "SELECT firstname, lastname, t.teamname, points " +
    "FROM (SELECT p.playerid, firstname, lastname, coalesce( sum(goals + assists), 0) as points " + 
          "FROM player p LEFT OUTER JOIN playerstats ps ON(ps.player = p.playerID) " +
	  "GROUP BY p.playerid, p.firstname, p.lastname ) DT " +
       "JOIN ROSTER r ON( DT.playerid = r.player ) " +
       "JOIN TEAM t ON( t.teamid = r.team ) " +
       "JOIN LEAGUE l ON( t.league = l.leagueid ) " +
    "WHERE l.leaguename = 'National Hockey League' " +
    "ORDER BY points DESC ", "TotalPointsMapping").setMaxResults(10)
    .getResultList();
</playerstotalpoints>

The query includes an outer join to ensure that all players are considered in the computation of the top ten list, even if they don’t have a row in the PLAYERSTATS table – perhaps a result of injury or a recent signing. Also note that we are using the setMaxResults() method, which automatically adds the “FETCH FIRST 10 ROWS ONLY” syntax to the statement before it is sent to the Derby 10.11 server.
Read more…

 
 Share on Facebook Share on Twitter Share on Reddit Share on LinkedIn
No Comments  comments 
formats

JPA and native SQL queries – part un

Hibernate is by far the most popular of the various object-relational mapping tools available for Java, and is one of the ORM toolkits that supports the Java Persistence API, which forms part of the Java EE 7 specification. The current JPA specification version is 2.1 (released April 2013), and you can find the specification for JPA 2.1 online. The reference implementation for JPA 2.1 is Oracle EclipseLink – in years past it was JBoss’ Hibernate.

As with Hibernate’s HQL language, the JPQL query language supported by JPA supports a basic SQL query specification: SELECT, FROM, WHERE, GROUP BY, HAVING, and ORDER BY clauses. JPQL also supports outer joins, and most recently JPA 2.1 offers support for an explicit ON condition for outer joins, using the familiar ON keyword and syntax so familiar to developers of SQL applications.

The problem, of course, is that JPQL, as with Hibernate’s HQL language, doesn’t support every useful SQL construction you might like to use in an application. I deliberately include the word useful here because not all SQL constructions have the same utility. Nonetheless, SQL constructions such as derived tables, particularly with outer joins, are virtually essential in all SQL applications, and derived tables are not supported in either of the HQL or JPQL languages.

Further complicating matters is that with JPA – as with Hibernate – the underlying assumption is that Java applications will utilize JPA services to retrieve JPA-managed objects from the underlying relational database, and once instantiated as objects by the object-relational mapper, the application will reference and manipulate these objects, updating them as necessary until issuing a save() request from the JPA entity manager.

Of course, in production applications one will want to query the database in all sorts of ways, using SQL’s expressive power to (1) minimize the amount of data that must be processed to compute the intended result, (2) avoid multiple requests sent to the server (the N+1 SELECTs problem), and (3) to minimize the amount of result data that has to be transmitted to the application over the wire. To do this, JPA offers two critically important facilities. The first is that JPA permits the developer to create a query using native SQL, using the EntityManager’s createNativeQuery() method. The second is the ability to specify what JPA terms a result set mapping, using the @ResultSetMapping annotation, so that applications can explicitly tell JPA how to map arbitrary queries to unmanaged Java POJO objects – or plain scalar variables, if one chooses – avoiding the tedious conversion between relational values and Java object types that JPA and other ORM toolkits were designed to help developers avoid in the first place.
Read more…

 
 Share on Facebook Share on Twitter Share on Reddit Share on LinkedIn
2 Comments  comments 
formats

CFP: Sixth Workshop on Big Data Benchmarking (WBDB 2015)

The Sixth International Workshop on Big Data Benchmarking will be held in Toronto, Ontario at the University of Toronto on 16-17 June 2015. The WBDB workshops are designed to make progress towards the development of industry-standard benchmarks for evaluating hardware and software solutions for big data applications.

Topics to be discussed at the Workshop include, but are not limited to:

  • Ideas and proposals for TPC and SPEC-style benchmarks
  • Synthetic data generation; measurement and metrics
  • Benchmarking next-generation hardware systems
  • Graph analytics
  • New applications scenarios based on social media
  • Socio-cultural dynamics and applications to security and privacy

Organizing committee

  • Chaitan Baru, San Diego Supercomputer Center (SDSC) UC San Diego
  • Tilmann Rabl, Middleware Services Research Group (MSRG), University of Toronto

Short versions of papers (4-8 pages) should be submitted by May 1, 2015, using the Microsoft conference management system. Notification of acceptance is scheduled for May 15. Camera-ready copies, suitable for final publication in Springer LCNS format, are expected to be received on 1 October 2015, after the conclusion of the Workshop.

 
 Share on Facebook Share on Twitter Share on Reddit Share on LinkedIn
No Comments  comments 
formats

Eclipse OEPE, Glassfish and port number conflicts

The software stack I’m currently using for teaching Java EE Application Programming includes:

  • Java JDK 1.7.0 r71, 32-bit edition;
  • Eclipse Luna 32-bit, the Oracle OEPE (Oracle Enterprise Pack for Eclipse) version 12.1.3.1.1, which contains various additional Web Tools Platform (WTP) Eclipse plugins.
  • Java EE 7 32-bit, which includes the Java Enterprise Edition 7 jar files, the Java 1.7.0 JRE, the Derby pure-Java database system (version 10.9.1.0), and the open-source GlassFish 4.0 application server.
  • Apache Tomcat 8.0.15 web server.
  • Java Hibernate 4.3.6, available here.

Integrating Apache Tomcat with Eclipse has been straightforward, even though I have had to alter my Tomcat installation to use port 8500 so that it does not conflict with the “standard” (typical) use of port 80 for http requests.

This week represented my first try at developing a tutorial on integrating Glassfish 4.0 with Eclipse Luna so as to develop a 3-tiered Java EE application. The non-standard part of this configuration is again port numbers – by default Glassfish utilizes port 8080 for unsecured requests, and port 4848 for administrator clients. The use of port 8080 for Glassfish is a non-starter for the same reasons: other software also utilizes port 8080 and on my machine 8080 is used by Oracle 11g XE. Consequently, the port numbers used by Glassfish must be changed, and this is done by editing the domain.xml file in the domain1 subdirectory to read as follows – I have changed the user traffic ports to 19600 (unsecured) and 19601 (secured) as the ports for Glassfish, with the administrator port left unchanged.

        <network-listeners>
          <network-listener port="19600" protocol="http-listener-1" transport="tcp" name="http-listener-1" thread-pool="http-thread-pool"></network-listener>
          <network-listener port="19601" protocol="http-listener-2" transport="tcp" name="http-listener-2" thread-pool="http-thread-pool"></network-listener>
          <network-listener port="4848" protocol="admin-listener" transport="tcp" name="admin-listener" thread-pool="admin-thread-pool"></network-listener>
        </network-listeners>

Changing the port numbers for Glassfish by editing domain.xml works fine, but there is a glitch: the problem is that the Glassfish plugin that ships with Eclipse Luna OEPE 12.1.3.1.1 does not support starting Glassfish on a port other than 8080. This makes Java EE application development with Eclipse impossible.

The solution is to upgrade the Eclipse Glassfish Tools plugin to the latest version: 7.3.4.201502282121. Unfortunately, this upgrade requires a corresponding upgrade (dated 5 January 2015) in a large number of other Eclipse plugins and, in turn, requires one to move to Java 8 as the JRE for Eclipse Luna as several of the new plugin versions are incompatible with Java 7.

Fortunately, the upgrades, though tedious, all completed successfully and creating a new Glassfish 4.0 server instance within Eclipse. using the new Glassfish Tools plugin, correctly discovers that Glassfish’s domain1 domain is configured to use port 19600.

glassfish_properties

Note that Java 8 is only required for Eclipse and its plugins – one can still customize the Java version required for any application, and for most Java EE applications Java 7 or 8 will work well.

 
 Share on Facebook Share on Twitter Share on Reddit Share on LinkedIn
No Comments  comments 
formats

Intuitive Design with Google’s Material Design Specification

The next talk in the User Experience Guest Lecture Series is entitled “Intuitive Design with Google’s Material Design Specification” and our guest presenter is my friend Cheryl Taylor, a User Experience Designer with Intelligent Mechatronic Systems in Waterloo.

Date: Tuesday, March 31, 17:00 to 18:00

Location: Doon 2A301

Abstract

Explore the philosophy behind Google’s material design specification. See how it encourages user interaction in a realistic and intuitive way by combining visual cues from life, with classic principles of design and animation, to create a unified standard for user experience across devices and platforms. The material design language expands on card motifs, focusing on grid-based layouts, responsive animations and transitions, padding, and depth created by light and shadows. Based on the concept of paper and ink, Google developed this specification with material as the metaphor. Intended to be as tactile as paper and ink, surfaces have depth and light, and interactions take place in a realistic and familiar way. Join us in this free and self-contained lecture as User Experience Guest Lecture Series celebrates its third year and ninth lecture.

About Cheryl Taylor

Cheryl Taylor is a User Experience Designer on the mobile team at Intelligent Mechatronic Systems. She has been working as a designer since the late 90s. In recent years her work has focused on design and usability for automotive, insurance, electronic health record, efinance, and customer relationship management software products. Cheryl studied Design and Fine Art at Ontario College of Art and Design University, Art History at University of Guelph, and Interactive Multimedia at Sheridan College Institute of Technology and Advanced Learning. Since 2004, Cheryl has been a graphic and Web design volunteer for numerous non-governmental organizations through the UN online volunteering program.

For more information on this talk or User Experience Guest Lecture Series, contact Dalibor Dvorski at ddvorski@conestogac.on.ca.

 
 Share on Facebook Share on Twitter Share on Reddit Share on LinkedIn
No Comments  comments