Course Identification

Introduction to Big Data Analysis and Machine Learning
20184241

Lecturers and Teaching Assistants

Prof. Eran Segal
N/A

Course Schedule and Location

2018
First Semester
Sunday, 13:15 - 14:00, Goldsmith, room 108
29/10/2017

Field of Study, Course Type and Credit Points

Mathematics and Computer Science: Seminar; 1.00 points
Life Sciences (Brain Sciences: Systems, Computational and Cognitive Neuroscience Track): 1.00 points

Comments

N/A

Prerequisites

No

Restrictions

14

Language of Instruction

English

Attendance and participation

Required in at least 80% of the lectures

Grade Type

Pass / Fail

Grade Breakdown (in %)

30%
70%

Evaluation Type

Seminar

Scheduled date 1

N/A
N/A
-
N/A

Estimated Weekly Independent Workload (in hours)

2

Syllabus

                • Machine learning Basics - Capacity, Overfitting and Underfitting, types of errors, “No Free Lunch” Theorem, Regularization, Performance estimation - Bias, Variance, Holdout method (train\test split), Stratification, Model evaluation, Cross Validation, model and algorithm selection, Bootstrap methods

                • Linear regression models, regularisation methods

                • Linear mixed models

                • Classification - different  types of classifiers, Performance measures - precision, recall, f1, ROC and PR curve, imbalanced datasets,

                • Statistical tests - which and when to use?, Hypothesis testing, p-value, Correlations, Permutations, Confidence intervals, Maximum likelihood

                • Dimensionality reduction - curse of dimensionality, SVD,  PCA (and variants such as IPCA, randomized and kPCA), LLE, LDA, t-SNE, autoencoders

                • Data preprocessing - handling missing data, imputation, categorical data, scaling, feature importance, basic signal processing maybe

                • Model evaluation and hyperparameter tuning - cross validation, grid and random search, learning curves, bias-variance tradeoff

                • Clustering and other unsupervised methods, K-means, Hierarchical clustering

                • Novelty and Outlier detection methods, RANSAC

                • Analysis of variance - ANOVA, t-test, Kruskal-Wallis test

                • Visualisations - best practices, packages

                • Tree-based methods, random forests

                • Ensemble learning - Bagging and Pasting, boosting, stacking, xgboost

                • Neural networks Intro  - basics, basic architectures, and when to use instead of classical methods

                • Machine learning pipeline - steps, best practices, handling large datasets

                • Reinforcement Learning - Intro

                • Probabilistic graphical models - intro

Learning Outcomes

Upon successful completion of this course students should be able to:

Gain basic understanding of key concepts in data science in areas of machine learning, data analysis, classification, clustering, and neural networks

Reading List

N/A

Website

N/A