Course Identification

Title:

Introduction to Big Data Analysis and Machine Learning

Code:

20184241

Lecturers and Teaching Assistants

Lecturers:

Prof. Eran Segal

TA's:

N/A

Course Schedule and Location

Year:

2018

Semester:

First Semester

When / Where:

Sunday, 13:15 - 14:00, Goldsmith, room 108

First Lecture:

29/10/2017

Field of Study, Course Type and Credit Points

Mathematics and Computer Science: Seminar; 1.00 points
Life Sciences (Brain Sciences: Systems, Computational and Cognitive Neuroscience Track): 1.00 points

Comments

N/A

Prerequisites

No

Restrictions

Participants:

14

Language of Instruction

English

Attendance and participation

Required in at least 80% of the lectures

Grade Type

Pass / Fail

Grade Breakdown (in %)

Attendance

30%

Seminar:

70%

Evaluation Type

Seminar

Scheduled date 1

Date / due date

N/A

Location

N/A

Time

-

Remarks

N/A

Estimated Weekly Independent Workload (in hours)

2

Syllabus

• Machine learning Basics - Capacity, Overfitting and Underfitting, types of errors, “No Free Lunch” Theorem, Regularization, Performance estimation - Bias, Variance, Holdout method (train\test split), Stratification, Model evaluation, Cross Validation, model and algorithm selection, Bootstrap methods

• Linear regression models, regularisation methods

• Linear mixed models

• Classification - different types of classifiers, Performance measures - precision, recall, f1, ROC and PR curve, imbalanced datasets,

• Statistical tests - which and when to use?, Hypothesis testing, p-value, Correlations, Permutations, Confidence intervals, Maximum likelihood

• Dimensionality reduction - curse of dimensionality, SVD, PCA (and variants such as IPCA, randomized and kPCA), LLE, LDA, t-SNE, autoencoders

• Data preprocessing - handling missing data, imputation, categorical data, scaling, feature importance, basic signal processing maybe

• Model evaluation and hyperparameter tuning - cross validation, grid and random search, learning curves, bias-variance tradeoff

• Clustering and other unsupervised methods, K-means, Hierarchical clustering

• Novelty and Outlier detection methods, RANSAC

• Analysis of variance - ANOVA, t-test, Kruskal-Wallis test

• Visualisations - best practices, packages

• Tree-based methods, random forests

• Ensemble learning - Bagging and Pasting, boosting, stacking, xgboost

• Neural networks Intro - basics, basic architectures, and when to use instead of classical methods

• Machine learning pipeline - steps, best practices, handling large datasets

• Reinforcement Learning - Intro

• Probabilistic graphical models - intro

Learning Outcomes

Upon successful completion of this course students should be able to:

Gain basic understanding of key concepts in data science in areas of machine learning, data analysis, classification, clustering, and neural networks

Reading List

N/A

Website

N/A