Course Identification

Principles and practice of large scale data analysis using R- group 2
20223471

Lecturers and Teaching Assistants

N/A

Course Schedule and Location

2022
First Semester
N/A
27/10/2021
18/03/2022

Field of Study, Course Type and Credit Points

Life Sciences: Lecture; Elective; Regular; 3.00 points
Chemical Sciences: Lecture; 2.00 points
Life Sciences (Molecular and Cellular Neuroscience Track): Lecture; 3.00 points
Life Sciences (Brain Sciences: Systems, Computational and Cognitive Neuroscience Track): Lecture; 3.00 points
Life Sciences (Computational and Systems Biology Track): Lecture; 3.00 points

Comments

N/A

Prerequisites

No

Restrictions

80

Language of Instruction

English

Attendance and participation

Expected and Recommended

Grade Type

Pass / Fail

Grade Breakdown (in %)

50%
50%

Evaluation Type

Final assignment

Scheduled date 1

N/A
N/A
-
N/A

Estimated Weekly Independent Workload (in hours)

2

Syllabus

  1. Introduction to the R programming environment.

  2. Basic Programing in R: Introduction to variables, data.frames lists, loops, conditional operators, functions.

  3. Visualization of single/two-dimensional data using basic package and ggplot2 

  4. First steps in analysis of data - Loading, Filtering, subsetting, Looking at correlations. Statistical tests for differences (T-test, Wilcoxon, KS) and how they are done in R. Application to the RNA-seq data.

  5. Control flow, conditionals, loops (for, apply, lapply, tapply, sapply,mapply), and functions

  6. Text and regular expressions, grep, basic sequence analysis (seqLogo package)

  7. Merging datasets - e.g. how to combine peaks from ChIP-seq with RNA-seq data in various ways.

  8. Multi-dimensional data: normalization, clustering (hierarchical/biclustering), PCA, and visualization.

  9. Bioconductor - introduction, some sample packages (SeqLogo? edgeR?). Models for significance of differential expression of RNA-seq data

  10. Building interactive interfaces using Shiny

  11. Machine learning - what it means, what are classifiers. ROC curves. Feature selection. What to be aware of. How to train a simple SVM. Application on the RNA-seq data. Cross validation etc. Visualization of ROC curves.
  12. Modeling biological data. 

Learning Outcomes

Upon successful completion of this course students will be able to use the R programming language and facilitating tools to analyze and extract biological insights from genomic datasets such as RNA-seq.

Reading List

N/A

Website

N/A