In this course, we will look under the hood of ML algorithms, learn about the steps and best
practices in the workflow of ML projects and how to implement these in Python by studying
Python programming concepts for ML using the Scikit-Learn Python library and reviewing code
implementations. Each group will work independently, with guidance from the lecturer, and will
hand in a detailed report and code for assessment.
Class Topics
1 Intro to ML with Emphasis on Applications in Healthcare and Intro to Python Scikit-Learn library: What is ML and why to use it, Types of ML, Main Challenges of training and best practices to address themת Applications of ML in healthcareת The structure and features of Electronic Health Record data, Sources of Medical Data, Unique challenges in healthcare, The potential of ML in advancing clinical decision support systems, The Scikit-Learn library and the principles of object-oriented programming it is based on
2 Linear Regression: Linear and polynomial regression, Regularized linear models, Logistic and softmax regression
3 Classification: Binary, multiclass, multilabel and multioutput, Performance measures, Changing the decision threshold, Model calibration, Upsampling and undersampling (SMOTE & GANs)
4 Support Vector Machines Linear and Non-Linear kernel SVMs for classification and regression
5 Decision Trees: Classification and regression decision trees, Pre-pruning vs. Post pruning
6-7 Ensemble Methods and Fine Tunning: Voting Classifiers, Bagging and Pasting, Random Forest, Gradient Boosting, XGBoost, CatBoost and LightGBM, How to fine-tune ensemble models (incl. Grid & Random Search, Optuna, early stopping, cross-validation and with GPU)
8-10 ML Project Workflow: Tidy data requirements, Splitting into train, validation and test sets, Exploratory data analysis, Data preparation (incl. Scikitlearn transformation pipelines, imputation methods and how to handle multiple observations from the same patients), Training and comparing performance between models, Hyperparameter tuning, Feature importance for model explainability (Gini, SHAP and LIME), Error analysis, Evaluation
11-12 Graph Neural Networks: Neural Networks and Graph Neural Networks
13 Supervised, Weakly and Unsupervised Learning and Anomaly Detection: PCA, IPCA, t-SNE, DBSCAN, UMAP and K-means, Applications of weak supervision to increase the amount of labelled data, Supervised/unsupervised and proximity/distribution/ensemble basedIncluding HBOS, Isolation Forest, GMM, PCA, KNN, LOF, CBLOF, XGBOD, Autoencoders
14 Time Series Forecasting - Introduction to time series
- Time series forecasting and anomaly detection on physiological data with GBDTs