Menu
A+ A A-

SDS 403: Fundamentals of Data Science and Statistics

Course Title

Fundamentals of Data Science and Statistics

Course Code

SDS 403

Course Type

Mandatory

Level

Master’s

Year / Semester

1st / 2nd  (Subject to change)

Instructors' Names

Simone Bacchio, Mihalis Nicolaou, Charalambos Chrysostomou

ECTS

10

Lectures / week

1 (2h)

Laboratories / week

1 (2h)

Course Purpose and Objectives

Introduce students to data science, big data analysis and statistics.  This includes a focus on statistical methods for data scientists, including random variables, probability theory, continuous and discrete distributions, inference, estimation, hypothesis testing and statistical significance.  To develop a set of practical skills and tools in terms visualizing, exploring, storing and processing data, and an introduction to cluster-computing frameworks (Hadoop, Spark).

Learning Outcomes

By the end of the course, the students will have a good grasp on statistical knowledge related to data science, and be able to apply this knowledge to data using modern tools and libraries.  The students will also be able to perform exploratory data analysis, as well as introductory techniques for visualization.  Students will also be familiarized with cluster-computing frameworks, and be able to apply and explain programming models such as MapReduce.

Prerequisites

None

 Requirements  None

Course Content

Introduction to Statistics and Statistical Learning:  Linear algebra review; statistics for data science; probability, random variables, correlation and causation, common probability distributions.  Statistical inference; estimation, hypothesis testing and statistical significance; introduction to Bayesian methods, regression, classification and time-series analysis.

Data Programming and Big Data Analysis: numerical tools and libraries for managing and analysing data of various types.  Exploratory data analysis and visualization.  Data structures for manipulating and storing data.  Data collection, standardization and analysis; introduction to cluster-computing frameworks (Hadoop, Spark).

Teaching Methodology

Lectures, exercises

Bibliography

C. Heumann, M. Schomaker, “Introduction to Statistics and Data Analysis”. Springer, 2016.

Haslwanter T., “An Introduction to Statistics with Python”, Springer, 2016.

J.W. Tukey.  Exploratory Data Analysis. Addison-Wesley, 1977.

Karau, H., Konwinski, A., Wendell, P., & Zaharia, M. Learning Spark: lightning-fast big data analysis.  O'Reilly Media, Inc., 2015.

White, T. Hadoop: The definitive guide. O'Reilly Media, Inc.

Gkoulalas-Divanis, A., & Labbi, A. (Eds). Large-Scale Data Analytics. Springer, 2012.

Assessment

25% coursework, 75% exam

Language

English