This website contains the accompanying resources of the papers our team has published about the integration of Big Data technologies into the computing curricula.

Paper: SQL: From Traditional Databases to Big Data

“SQL: From Traditional Databases to Big Data”, Yasin N. Silva, Isadora Almeida, and Michell Queiroz, in proceedings of the 47th ACM Symposium on Computer Science Education (SIGCSE '16), Memphis, Tennessee, USA, 2016.

Download Paper (PDF)

Download SIGCSE Presentation Slides (PDF)

Abstract. The Structured Query Language (SQL) is the main programing language designed to manage data stored in database systems. While SQL was initially used only with relational database management systems (RDBMS), its use has been significantly extended with the advent of new types of database systems. Specifically, SQL has been found to be a powerful query language in highly distributed and scalable systems that process Big Data, i.e., datasets with high volume, velocity and variety. While traditional relational databases represent now only a small fraction of the database systems landscape, most database courses that cover SQL consider only the use of SQL in the context of traditional relational systems. In this paper, we propose teaching SQL as a general language that can be used in a broad range of database systems from traditional RDBMSs to Big Data systems. This paper presents well-structured guidelines to introduce SQL in the context of new types of database systems including MapReduce, NoSQL and NewSQL. A key contribution of this paper is the description of an array of course resources, e.g., virtual machines, sample projects, and in-class exercises, to enable a hands-on experience with SQL across a broad set of modern database systems.

Resources

SQL in MapReduce Systems

SQL in NoSQL Systems

SQL in NewSQL Systems

Paper: Integrating Big Data into the Computing Curricula

“Integrating Big Data into the Computing Curricula”, Yasin N. Silva, Suzanne W. Dietrich, Jason M. Reed, and Lisa M. Tsosie, in proceedings of the 45th ACM Symposium on Computer Science Education (SIGCSE '14), Atlanta, USA, 2014.

Download Paper (PDF)

Download SIGCSE Presentation Slides (PDF)

Abstract. An important recent technological development in computer science is the availability of highly distributed and scalable systems to process Big Data, i.e., datasets with high volume, velocity and variety. Given the extensive and effective use of systems incorporating Big Data in many application scenarios, these systems have become a key component in the broad landscape of database systems. This fact creates the need to integrate the study of Big Data Management Systems as part of the computing curricula, particularly as part of database courses. This paper presents well-structured guidelines to perform this integration by describing the important types of Big Data systems and demonstrating how each type of system can be integrated into the curriculum. A key contribution of this paper is the description of a wide array of course resources, e.g., virtual machines, sample projects, and in-class exercises, and how these resources support the learning outcomes and enable a hands-on experience with Big Data technologies.

Resources

These are the power point slides prepared for the three Big Data learning units covered in the paper. Instructors are welcomed to re-use and modify these slides.

The following resources are available to instructors. To request them, please send an email to Yasin Silva (ysilva [at] asu [dot] edu).

  • VMWare Virtual Machine: This VM (OpenSuse Linux, Hadoop, HBase) contains most of the resources described in the paper: sample datasets, MStation data generator, MapReduce and NoSQL in-class exercices with solutions.
  • MapReduce Project: Additional questions for a MapReduce project assignment.
  • End-of-class survey: The survey used to evaluate the extent to which the learning objectives were achieved.