The Fifth International Workshop on Big Data Benchmarking will be held in Potsdam, Germany at the Hasso Platner Institut from August 5-6, 2014. The WBDB workshops are designed to make progress towards the development of industry-standard benchmarks for evaluating hardware and software solutions for big data applications.
Topics to be discussed at the Workshop include, but are not limited to:
- Data features: New feature sets of data including, high-dimensional data, sparse data, event-based data, and enormous data sizes.
- System characteristics: System-level issues including, large-scale and evolving system configurations, shifting loads, and heterogeneous technologies for big data and cloud platforms.
- Implementation options: Different implementation options such as SQL, NoSQL, Hadoop software ecosystem, and different implementations of HDFS.
- Workload: Representative big data business problems and corresponding benchmark implementations. Specification of benchmark applications that represent the different modalities of big data, including graphs, streams, scientific data, and document collections.
- Hardware options: Evaluation of new options in hardware including different types of HDD, SSD, and main memory, and large-memory systems, and new platform options that include dedicated commodity clusters and cloud platforms.
- Synthetic data generation: Models and procedures for generating large-scale synthetic data with requisite properties.
- Benchmark execution rules: E.g. data scale factors, benchmark versioning to account for rapidly evolving workloads and system configurations, benchmark metrics.
- Metrics for efficiency: Measuring the efficiency of the solution, e.g. based on costs of acquisition, ownership, energy and/or other factors, while encouraging innovation and avoiding benchmark escalations that favor large inefficient configuration over small efficient configurations.
- Evaluation frameworks: Tool chains, suites and frameworks for evaluating big data systems.
- Early implementations: Of the Deep Analytics Pipeline or BigBenchand lessons learned in benchmarking big data applications.
- Enhancements: Proposals to augment these benchmarks, e.g. by adding more data genres (e.g. graphs), or incorporating a range of machine learning and other algorithms, will be entertained and are encouraged.
- Chaitan Baru, San Diego Supercomputer Center (SDSC) UC San Diego
- Tilmann Rabl, Middleware Services Research Group (MSRG), University of Toronto
- Kai Sachs, SAP AG
- Matthias Uflacker, Hasso Plattner Institute
- Henning Schmitz, SAP Innovations Center
- Meikel Poess, Oracle
Short versions of papers (4-8 pages) should be submitted by May 4, 2014, using the Easychair conference management system. Notification of acceptance is scheduled for June 8. Camera-ready copies of each paper, in Springer LCNS format, are expected to be received on July 20, prior to the Workshop.