Course Identification

Title:

Statistical inference in the biological sciences

Code:

20253612

Lecturers and Teaching Assistants

Lecturers:

Dr. Justin Bois

TA's:

N/A

Course Schedule and Location

Year:

2025

Semester:

Second Semester

When / Where:

09:00 - 15:00, WSoS, Rm C

First Lecture:

07/09/2025

End date:

18/09/2025

Field of Study, Course Type and Credit Points

Life Sciences: Lecture; Elective; 3.00 points

Comments

This course will be in-person only

Prerequisites

Students should have experience programming in Python, which will be the language of instruction. They would also be well-served to have some background in linear algebra and calculus, though they will still be fine if they're rusty!

Restrictions

Participants:

20

Language of Instruction

English

Attendance and participation

Obligatory

Grade Type

Numerical (out of 100)

Grade Breakdown (in %)

Attendance

50%

Final:

50%

Evaluation Type

Final assignment

Scheduled date 1

Date / due date

N/A

Location

N/A

Time

-

Remarks

N/A

Estimated Weekly Independent Workload (in hours)

1

Syllabus

Day 1

9-9:30: Lesson 0: Welcome and lecture on analysis pipelines

9:35-10:25: Lesson 1: Introduction to data frames and Polars

10:35-11:25: Exercise 1: Joining data frames, using filter and select contexts

11:35-12:25: Lesson 2: Split-apply-combine

12:30-13:00: Exercise 2: Wrangling, Split-apply-combine with group by context

Day 2

9-9:30: Presentations of solutions of exercises 1 and 2

9:35-10:25: Lesson 3: The Python plotting landscape and Bokeh

10:35-11:25: Exercise 3: Making scatter plots with Bokeh

11:35-12:25: Lesson 4: ECDFs and plots of univariate data, wangling and plotting finch beak data

12:30-13:00: Exercise 4: Plotting with iqplot

Day 3

9-9:30: Presentations of solutions of exercises 3 and 4

9:35-10:25: Lessons 5: Probability and probability distributions

10:35-11:25: Exercise 5: Derive pdf for time of microtubule catastrophe, Simulate a distribution

11:35-12:25: Lesson 6: Generative modeling

12:30-13:00: Exercise 6: You already can build generative models!

Day 4

9-9:30: Presentations of solutions of exercises 5 and 6

9:35-10:25: Lesson 7: Introduction to Bayesian modeling

10:35-11:25: Exercise 7: Practice with Bayesian modeling

11:35-12:25: Lesson 8: Parameter estimation by optimization

12:30-13:00: Exercise 8: Parameter estimation by optimization

Day 5

9-9:30: Presentations of solutions of exercises 7 and 8

9:35-10:25: Lesson 9: Markov chain Monte Carlo and Stan

10:35-11:25: Exercise 9: First foray into MCMC

11:35-12:25: Lesson 10: Display of MCMC results and diagnostics

12:30-13:00: Exercise 10: Bayesian inference with MCMC

Day 6

9-9:30: Presentations of solutions of exercises 9 and 10

9:35-10:25: Lesson 11: Display of MCMC results and diagnostics

10:35-11:25: Exercise 11: Yet more inference problems

11:35-12:25: Lesson 12: Mixture models, the EM algorithm, and identifiability

12:30-13:00: Exercise 12: Inference with mixture models

Day 7

9-9:30: Presentations of solutions of exercises 11 and 12

9:35-10:25: Lesson 13: Variate-covariate models

10:35-11:25: Exercise 13: Inference with variate-covariate models

11:35-12:25: Lesson 14: Variate-covariate models

12:30-13:00: Exercise 14: Inference with variate-covariate models

Day 8

9-9:30: Presentations of solutions of exercises 13 and 14

9:35-10:25: Lesson 15: Hierarchical models

10:35-11:25: Exercise 15: Inference with hierarchical models

11:35-12:25: Lesson 16: Principled analysis pipelines

12:30-13:00: Exercise 16: Practice taking principled approaches

Day 9

9-9:30: Presentations of solutions of exercises 15 and 16

9:35-10:25: Lesson 17: Gaussian processes

10:35-11:25: Exercise 17: Implementation of GPs

11:35-12:25: Lesson 18: Hidden Markov models

12:30-13:00: Exercise 18: Implementation of HMMs

Day 10

9-9:30: Presentations of solutions of exercises 17 and 18

9:35-10:25: Lesson 19: Dimensionality reduction from a Bayesian perspective

10:35-11:25: Exercise 19: Factor analysis and (probabilistic) PCA

11:35-12:25: Lesson 20: Review and wrap-up

12:30-13:00: Exercise 20: Discussion on using what we've learned in research applications

Learning Outcomes

The primary learning objective is to empower and encourage students to develop robust data analysis pipelines. On a more fine-grained level, students will learn to:

- Organize and visualize data sets using the Python ecosystem, with heavy use of Polars and Bokeh.
- Build generative models for data production. (Specific models include logistic regression, variate-covariate models, hierarchical models, hidden Markov models, and Gaussian processes.)
- Perform inference of parameter values in a Bayesian setting via numerical optimization, Markov chain Monte Carlo, and other approximate sampling methods.
- Develop and implement principled analysis pipelines for robust inference and model assessment.

Reading List

N/A

Website

N/A