SDS 403: Fundamentals of Data Science and Statistics

Course Title	Fundamentals of Data Science and Statistics
Course Code	SDS 403
Course Type	Mandatory
Level	Master’s
Year / Semester	1^st / 2^nd (Subject to change)
Instructors' Names	Simone Bacchio, Mihalis Nicolaou, Charalambos Chrysostomou
ECTS	10	Lectures / week		1 (2h)	Laboratories / week		1 (2h)
Course Purpose and Objectives	Introduce students to data science, big data analysis and statistics. This includes a focus on statistical methods for data scientists, including random variables, probability theory, continuous and discrete distributions, inference, estimation, hypothesis testing and statistical significance. To develop a set of practical skills and tools in terms visualizing, exploring, storing and processing data, and an introduction to cluster-computing frameworks (Hadoop, Spark).
Learning Outcomes	By the end of the course, the students will have a good grasp on statistical knowledge related to data science, and be able to apply this knowledge to data using modern tools and libraries. The students will also be able to perform exploratory data analysis, as well as introductory techniques for visualization. Students will also be familiarized with cluster-computing frameworks, and be able to apply and explain programming models such as MapReduce.
Prerequisites	None		Requirements			None
Course Content	Introduction to Statistics and Statistical Learning: Linear algebra review; statistics for data science; probability, random variables, correlation and causation, common probability distributions. Statistical inference; estimation, hypothesis testing and statistical significance; introduction to Bayesian methods, regression, classification and time-series analysis. Data Programming and Big Data Analysis: numerical tools and libraries for managing and analysing data of various types. Exploratory data analysis and visualization. Data structures for manipulating and storing data. Data collection, standardization and analysis; introduction to cluster-computing frameworks (Hadoop, Spark).
Teaching Methodology	Lectures, exercises
Bibliography	C. Heumann, M. Schomaker, “Introduction to Statistics and Data Analysis”. Springer, 2016. Haslwanter T., “An Introduction to Statistics with Python”, Springer, 2016. J.W. Tukey. Exploratory Data Analysis. Addison-Wesley, 1977. Karau, H., Konwinski, A., Wendell, P., & Zaharia, M. Learning Spark: lightning-fast big data analysis. O'Reilly Media, Inc., 2015. White, T. Hadoop: The definitive guide. O'Reilly Media, Inc. Gkoulalas-Divanis, A., & Labbi, A. (Eds). Large-Scale Data Analytics. Springer, 2012.
Assessment	25% coursework, 75% exam
Language	English