Course Identification

Introduction to Big Data and Machine Learning
20223321

Lecturers and Teaching Assistants

Leon Anavy, Prof. Zohar Yakhini
Alon Oring, Ben Galili

Course Schedule and Location

2022
First Semester
Wednesday, 14:15 - 16:30

Tutorials
Wednesday, 17:15 - 18:00,
27/10/2021
18/03/2022

Field of Study, Course Type and Credit Points

Life Sciences (Computational and Systems Biology Track): Lecture; Obligatory; Core; 2.50 points

Comments

Class will take place 14:15-16:30 every Wednesday at IDC.
Recitation will take place 17:00-18:00 and the course staff will be available for consultation after that (office hours), until 19:00.
The first class will take place on 27/Oct/2021 and the last class will take place on 26/Jan/2021.

Prerequisites

Basic programming course + basic calculus and linear algebra

Restrictions

20

Language of Instruction

English

Registration by

17/10/2021

Attendance and participation

Obligatory

Grade Type

Numerical (out of 100)

Grade Breakdown (in %)

60%
40%
Project

Evaluation Type

Final assignment

Scheduled date 1

N/A
N/A
-
N/A

Estimated Weekly Independent Workload (in hours)

N/A

Syllabus

Part 1 - Introduction to Machine Learning 

Introduction, linear regression 
Python libraries: pandas, numpy, visualization libraries 
Evaluation, training and test sets, ROC curves
Classification
Clustering, PCA

Part 2 - Data Science and Statistics 

Density estimation, MLE, Bayes classification
Statistics for scientists: correlations, p-values, and multiple testing

Advanced statistical methods: non-parametric tests, 
A mini project in analyzing high throughput data

Part 3 - Class Workshops

NGS Data Analysis
End-to-end data analysis and model building.

Part 4 - Presentation of the mini project results

 

Learning Outcomes

Upon successful completion of this course students should be able to:

- understand machine learning algorithms and apply them to data 

- statistically asses observations in data including correlations

- launch and use Big Data platforms to analyze large volumes of data

- understand and configure machine learning packages including Deep Learning and SVM

- analyze large volumes of experimental data and present results   

Reading List

  • An introduction to statistical learning by James and co.
  • Pattern recognition and machine learning by Bishop and co.

Website

N/A