Course Identification

Applied Machine Learning for Life Sciences
20243531

Lecturers and Teaching Assistants

Dr. Leeat Yankielowicz-Keren, Ms. Ortal Dayan
N/A

Course Schedule and Location

2024
First Semester
Monday, 16:00 - 20:00, FGS, Rm A
11/12/2023
26/02/2024

Field of Study, Course Type and Credit Points

Life Sciences: Lecture; 4.00 points
Life Sciences (Computational and Systems Biology Track): Lecture; 4.00 points
Life Sciences (ExCLS Track): Lecture; Elective; 4.00 points

Comments

Not open for auditors
This course will be held by hybrid learning on Monday's 16:00-20:00 at Room A in FGS
Except for the 25.12 moved to 26.12 in room A between 16:00-20:00

Prerequisites

Previous knowledge and experience in the Python programming libraries Numpy, Pandas and Matplotlib are required for this course. 

Restrictions

20

Language of Instruction

English

Attendance and participation

Required in at least 80% of the lectures

Grade Type

Numerical (out of 100)

Grade Breakdown (in %)

100%

Evaluation Type

Final assignment

Scheduled date 1

N/A
N/A
-
N/A

Estimated Weekly Independent Workload (in hours)

6

Syllabus

In this course, we will look under the hood of ML algorithms, learn about the steps and best
practices in the workflow of ML projects and how to implement these in Python by studying
Python programming concepts for ML using the Scikit-Learn Python library and reviewing code
implementations. Each group will work independently, with guidance from the lecturer, and will
hand in a detailed report and code for assessment.

Class Topics 

1 Intro to ML with Emphasis on Applications in Healthcare and Intro to Python Scikit-Learn library
- What is ML and why to use it
- Types of ML
- Main Challenges of training and best practices to address them
- Applications of ML in healthcare
- The structure and features of Electronic Health Record data
- Sources of Medical Data
- Unique challenges in healthcare
- The potential of ML in advancing clinical decision support systems
- The Scikit-Learn library and the principles of object-oriented programming it is based on

2 Linear Regression - Linear and polynomial regression
- Regularized linear models
- Logistic and softmax regression

3 Classification - Binary, multiclass, multilabel and multioutput
- Performance measures
- Changing the decision threshold
- Model calibration
- Upsampling and undersampling (SMOTE & GANs)

4 Support Vector Machines Linear and Non-Linear kernel SVMs for classification and regression

5 Decision Trees - Classification and regression decision trees
- Pre-pruning vs. Post pruning

6-7 Ensemble Methods and Fine Tunning
- Voting Classifiers
- Bagging and Pasting
- Random Forest
- Gradient Boosting
- XGBoost, CatBoost and LightGBM
- How to fine-tune ensemble models (incl. Grid & Random Search, Optuna, early stopping, cross-validation and with GPU)

8-10 ML Project Workflow                                                                                                                                - Tidy data requirements
- Splitting into train, validation and test sets
- Exploratory data analysis                                                                                                                              
- Data preparation (incl. Scikitlearn transformation pipelines, imputation methods and how to handle            multiple observations from the same patients)
- Training and comparing performance between models
- Hyperparameter tuning
- Feature importance for model explainability (Gini, SHAP and LIME)                                                              - Error analysis
- Evaluation


11-12 Graph Neural Networks                                                                                                                          - Neural Networks
- Graph Neural Networks

13 Supervised, Weakly and Unsupervised Learning and Anomaly Detection                                                  -  PCA, IPCA,  t-SNE, DBSCAN, UMAP and K-means                                                                                  -  Applications of weak supervision to increase the amount of labelled data                                                  - Supervised/unsupervised and proximity/distribution/ensemble based
- Including HBOS, Isolation Forest, GMM, PCA, KNN, LOF, CBLOF, XGBOD, Autoencoders

14 Time Series Forecasting                                                                                                                              - Introduction to time series
- Time series forecasting and anomaly detection on physiological data with GBDTs

Learning Outcomes

Upon successful completion of this course, students should be able to:

The course provides a detailed guide on how to complete Machine Learning Projects End to End
using tabular data. Students will gain a good understanding of the theory behind ML algorithms
and gain experience in applying best practices by completing an end-to-end data science
research project (in groups of 2-3 students) using a large electronic health record dataset and
Python programming including object-oriented concepts and making use of GPUs.

Reading List

Aurelien Geron. (2019). Hands-On Machine Learning with Scikit-Learn, Keras and Tensorflow,
O'reilly Media, Inc.

Website

N/A