Course Identification

Principles and practice of large scale data analysis using R
20243361

Lecturers and Teaching Assistants

Dr. Yaron Antebi, Prof. Igor Ulitsky, Prof. Schraga Schwartz
Rotem Tal, Einav Somech, Dr. Rony Chanoch, Edo Kiper

Course Schedule and Location

2024
First Semester
Sunday, 14:15 - 16:00, FGS, Rm C

Tutorials
Tuesday, 12:00 - 14:00, FGS, Rm C
17/12/2023
03/03/2024

Field of Study, Course Type and Credit Points

Life Sciences: Lecture; Elective; Regular; 3.00 points
Chemical Sciences: Lecture; Elective; Regular; 3.00 points
Life Sciences (Brain Sciences: Systems, Computational and Cognitive Neuroscience Track): Lecture; Elective; Regular; 3.00 points
Life Sciences (Computational and Systems Biology Track): Lecture; Obligatory; Core; 3.00 points
Life Sciences (ExCLS Track): Lecture; Elective; Regular; 3.00 points

Comments

This course will be held by hybrid learning
First session will be held on 11.12 between 14:15-16:00 in Room C at FGS

Prerequisites

No

Restrictions

80

Language of Instruction

English

Attendance and participation

Expected and Recommended

Grade Type

Numerical (out of 100)

Grade Breakdown (in %)

50%
50%

Evaluation Type

Final assignment

Scheduled date 1

N/A
N/A
-
N/A

Estimated Weekly Independent Workload (in hours)

3

Syllabus

  1. Introduction to the R programming environment.

  2. Basic Programing in R: Introduction to variables, data.frames lists, loops, conditional operators, functions.

  3. Visualization of single/two-dimensional data using basic package and ggplot2 

  4. First steps in analysis of data - Loading, Filtering, subsetting, Looking at correlations. Statistical tests for differences (T-test, Wilcoxon, KS) and how they are done in R. Application to the RNA-seq data.

  5. Control flow, conditionals, loops (for, apply, lapply, tapply, sapply,mapply), and functions

  6. Text and regular expressions, grep, basic sequence analysis (seqLogo package)

  7. Merging datasets - e.g. how to combine peaks from ChIP-seq with RNA-seq data in various ways.

  8. Multi-dimensional data: normalization, clustering (hierarchical/biclustering), PCA, and visualization.

  9. Bioconductor - introduction, some sample packages (SeqLogo? edgeR?). Models for significance of differential expression of RNA-seq data

  10. Building interactive interfaces using Shiny

  11. Machine learning - what it means, what are classifiers. ROC curves. Feature selection. What to be aware of. How to train a simple SVM. Application on the RNA-seq data. Cross validation etc. Visualization of ROC curves.
  12. Modeling biological data. 

Learning Outcomes

Upon successful completion of this course students will be able to use the R programming language and facilitating tools to analyze and extract biological insights from genomic datasets such as RNA-seq.

Reading List

N/A

Website

N/A