Course Identification

Statistical inference in the biological sciences
20253612

Lecturers and Teaching Assistants

Dr. Justin Bois
N/A

Course Schedule and Location

2025
Second Semester
09:00 - 15:00, WSoS, Rm C
07/09/2025
18/09/2025

Field of Study, Course Type and Credit Points

Life Sciences: Lecture; Elective; 3.00 points

Comments

This course will be in-person only

Prerequisites

Students should have experience programming in Python, which will be the language of instruction. They would also be well-served to have some background in linear algebra and calculus, though they will still be fine if they're rusty!

Restrictions

20

Language of Instruction

English

Attendance and participation

Obligatory

Grade Type

Numerical (out of 100)

Grade Breakdown (in %)

50%
50%

Evaluation Type

Final assignment

Scheduled date 1

N/A
N/A
-
N/A

Estimated Weekly Independent Workload (in hours)

1

Syllabus

Day 1

9-9:30: Lesson 0: Welcome and lecture on analysis pipelines

9:35-10:25: Lesson 1: Introduction to data frames and Polars

10:35-11:25: Exercise 1: Joining data frames, using filter and select contexts

11:35-12:25: Lesson 2: Split-apply-combine

12:30-13:00: Exercise 2: Wrangling, Split-apply-combine with group by context

 

Day 2

9-9:30: Presentations of solutions of exercises 1 and 2

9:35-10:25: Lesson 3: The Python plotting landscape and Bokeh

10:35-11:25: Exercise 3: Making scatter plots with Bokeh

11:35-12:25: Lesson 4: ECDFs and plots of univariate data, wangling and plotting finch beak data

12:30-13:00: Exercise 4: Plotting with iqplot

 

Day 3

9-9:30: Presentations of solutions of exercises 3 and 4

9:35-10:25: Lessons 5: Probability and probability distributions

10:35-11:25: Exercise 5: Derive pdf for time of microtubule catastrophe, Simulate a distribution

11:35-12:25: Lesson 6: Generative modeling

12:30-13:00: Exercise 6: You already can build generative models!

 

Day 4

9-9:30: Presentations of solutions of exercises 5 and 6

9:35-10:25: Lesson 7: Introduction to Bayesian modeling

10:35-11:25: Exercise 7: Practice with Bayesian modeling

11:35-12:25: Lesson 8: Parameter estimation by optimization

12:30-13:00: Exercise 8: Parameter estimation by optimization

 

Day 5

9-9:30: Presentations of solutions of exercises 7 and 8

9:35-10:25: Lesson 9: Markov chain Monte Carlo and Stan

10:35-11:25: Exercise 9: First foray into MCMC

11:35-12:25: Lesson 10: Display of MCMC results and diagnostics

12:30-13:00: Exercise 10: Bayesian inference with MCMC

 

Day 6

9-9:30: Presentations of solutions of exercises 9 and 10

9:35-10:25: Lesson 11: Display of MCMC results and diagnostics

10:35-11:25: Exercise 11: Yet more inference problems

11:35-12:25: Lesson 12: Mixture models, the EM algorithm, and identifiability

12:30-13:00: Exercise 12: Inference with mixture models

 

Day 7

9-9:30: Presentations of solutions of exercises 11 and 12

9:35-10:25: Lesson 13: Variate-covariate models

10:35-11:25: Exercise 13: Inference with variate-covariate models

11:35-12:25: Lesson 14: Variate-covariate models

12:30-13:00: Exercise 14: Inference with variate-covariate models

 

Day 8

9-9:30: Presentations of solutions of exercises 13 and 14

9:35-10:25: Lesson 15: Hierarchical models

10:35-11:25: Exercise 15: Inference with hierarchical models

11:35-12:25: Lesson 16: Principled analysis pipelines

12:30-13:00: Exercise 16: Practice taking principled approaches

 

Day 9

9-9:30: Presentations of solutions of exercises 15 and 16

9:35-10:25: Lesson 17: Gaussian processes

10:35-11:25: Exercise 17: Implementation of GPs

11:35-12:25: Lesson 18: Hidden Markov models

12:30-13:00: Exercise 18: Implementation of HMMs

 

Day 10

9-9:30: Presentations of solutions of exercises 17 and 18

9:35-10:25: Lesson 19: Dimensionality reduction from a Bayesian perspective 

10:35-11:25: Exercise 19: Factor analysis and (probabilistic) PCA 

11:35-12:25: Lesson 20: Review and wrap-up

12:30-13:00: Exercise 20: Discussion on using what we've learned in research applications

 

Learning Outcomes

The primary learning objective is to empower and encourage students to develop robust data analysis pipelines. On a more fine-grained level, students will learn to:

- Organize and visualize data sets using the Python ecosystem, with heavy use of Polars and Bokeh.
- Build generative models for data production. (Specific models include logistic regression, variate-covariate models, hierarchical models, hidden Markov models, and Gaussian processes.)
- Perform inference of parameter values in a Bayesian setting via numerical optimization, Markov chain Monte Carlo, and other approximate sampling methods.
- Develop and implement principled analysis pipelines for robust inference and model assessment.

Reading List

N/A

Website

N/A