Course Identification

Principles and practice of large scale data analysis using R
20253181

Lecturers and Teaching Assistants

Dr. Yaron Antebi, Prof. Igor Ulitsky, Prof. Schraga Schwartz
Noa Frydman, Edo Kiper, Dr. Nisan Feigin, Sharon Samuel

Course Schedule and Location

2025
First Semester
Sunday, 14:15 - 16:00, WSoS, Rm C

Tutorials
Tuesday, 12:00 - 14:00, Wolfson Auditorium
03/11/2024
26/01/2025

Field of Study, Course Type and Credit Points

Life Sciences: Lecture; Elective; Regular; 3.00 points
Chemical Sciences: Lecture; 3.00 points
Life Sciences (Brain Sciences: Systems, Computational and Cognitive Neuroscience Track): Lecture; 3.00 points

Comments

On the 2nd week of January the course will be held on Wednesday, January 15 between 14:15-16 at WSoS, Rm C.

Prerequisites

No

Restrictions

80

Language of Instruction

English

Attendance and participation

Expected and Recommended

Grade Type

Numerical (out of 100)

Grade Breakdown (in %)

50%
50%

Evaluation Type

Final assignment

Scheduled date 1

N/A
N/A
-
N/A

Estimated Weekly Independent Workload (in hours)

4

Syllabus

  1. Introduction to the R programming environment.

  2. Basic Programing in R: Introduction to variables, data.frames lists, loops, conditional operators, functions.

  3. Visualization of single/two-dimensional data using basic package and ggplot2 

  4. First steps in analysis of data - Loading, Filtering, subsetting, Looking at correlations. Statistical tests for differences (T-test, Wilcoxon, KS) and how they are done in R. Application to the RNA-seq data.

  5. Control flow, conditionals, loops (for, apply, lapply, tapply, sapply,mapply), and functions

  6. Text and regular expressions, grep, basic sequence analysis (seqLogo package)

  7. Merging datasets - e.g. how to combine peaks from ChIP-seq with RNA-seq data in various ways.

  8. Multi-dimensional data: normalization, clustering (hierarchical/biclustering), PCA, and visualization.

  9. Bioconductor - introduction, some sample packages (SeqLogo? edgeR?). Models for significance of differential expression of RNA-seq data

  10. Building interactive interfaces using Shiny

  11. Machine learning - what it means, what are classifiers. ROC curves. Feature selection. What to be aware of. How to train a simple SVM. Application on the RNA-seq data. Cross validation etc. Visualization of ROC curves.
  12. Modeling biological data. 

Learning Outcomes

Upon successful completion of this course students will be able to use the R programming language and facilitating tools to analyze and extract biological insights from genomic datasets such as RNA-seq.

Reading List

N/A

Website

N/A