SDS 403: Fundamentals of Data Science and Statistics
Course Title |
Fundamentals of Data Science and Statistics |
||||||
Course Code |
SDS 403 |
||||||
Course Type |
Mandatory |
||||||
Level |
Master’s |
||||||
Year / Semester |
1^{st} / 2^{nd } (Subject to change) |
||||||
Instructor’s Name |
Mihalis Nicolaou, Charalambos Chrysostomou |
||||||
ECTS |
10 |
Lectures / week |
1 (2h) |
Laboratories / week |
1 (2h) |
||
Course Purpose and Objectives |
Introduce students to data science, big data analysis and statistics. This includes a focus on statistical methods for data scientists, including random variables, probability theory, continuous and discrete distributions, inference, estimation, hypothesis testing and statistical significance. To develop a set of practical skills and tools in terms visualizing, exploring, storing and processing data, and an introduction to cluster-computing frameworks (Hadoop, Spark). |
||||||
Learning Outcomes |
By the end of the course, the students will have a good grasp on statistical knowledge related to data science, and be able to apply this knowledge to data using modern tools and libraries. The students will also be able to perform exploratory data analysis, as well as introductory techniques for visualization. Students will also be familiarized with cluster-computing frameworks, and be able to apply and explain programming models such as MapReduce. |
||||||
Prerequisites |
None |
Requirements | None | ||||
Course Content |
Introduction to Statistics and Statistical Learning: Linear algebra review; statistics for data science; probability, random variables, correlation and causation, common probability distributions. Statistical inference; estimation, hypothesis testing and statistical significance; introduction to Bayesian methods, regression, classification and time-series analysis. Data Programming and Big Data Analysis: numerical tools and libraries for managing and analysing data of various types. Exploratory data analysis and visualization. Data structures for manipulating and storing data. Data collection, standardization and analysis; introduction to cluster-computing frameworks (Hadoop, Spark). |
||||||
Teaching Methodology |
Lectures, exercises |
||||||
Bibliography |
C. Heumann, M. Schomaker, “Introduction to Statistics and Data Analysis”. Springer, 2016. Haslwanter T., “An Introduction to Statistics with Python”, Springer, 2016. J.W. Tukey. Exploratory Data Analysis. Addison-Wesley, 1977. Karau, H., Konwinski, A., Wendell, P., & Zaharia, M. Learning Spark: lightning-fast big data analysis. O'Reilly Media, Inc., 2015. White, T. Hadoop: The definitive guide. O'Reilly Media, Inc. Gkoulalas-Divanis, A., & Labbi, A. (Eds). Large-Scale Data Analytics. Springer, 2012. |
||||||
Assessment |
25% coursework, 75% exam |
||||||
Language |
English |