Course Identification

Introduction to Data Analysis in R
20205081

Lecturers and Teaching Assistants

Dr. Giora Alexandron
Dr. Tanya Nazaretsky

Course Schedule and Location

2020
First Semester
Thursday, 09:00 - 10:30, Musher, Lab 3
07/11/2019

Field of Study, Course Type and Credit Points

Science Teaching: Lecture; Elective; Regular; 2.00 points

Comments

ב 12 בדצמבר וב 26 בדצמבר, הקורס יתקיים בכיתה 2
Musher Lab 2

Prerequisites

No

Restrictions

20

Language of Instruction

Hebrew

Attendance and participation

Expected and Recommended

Grade Type

Numerical (out of 100)

Grade Breakdown (in %)

10%
40%
45%
5%

Evaluation Type

Final assignment

Scheduled date 1

N/A
N/A
-
N/A

Estimated Weekly Independent Workload (in hours)

2

Syllabus

Data analysis is becoming a fundamental competency in educational research. R is a free programming language for statistical computing, which is very popular among researchers and data scientists for data analysis and visualization. The course will cover the basics of data analysis in R.  Evaluation will be based mainly on assignments and a final project.

Below is a list of the main topics that the course will touch upon (not necessarily in this order):

  • Preliminaries – the very basics of R:
    1. Introduction to variables, data structures and their representation in R (variables, Vectors, Lists, Matrices, data frames)
    2. Input/output: loading data files, saving results to file
  • Data Programming:
    • Sub-setting and Splitting data
    • Control flow: Conditions and loops
    • Merging datasets
    • The Apply family of functions: Apply a function to all items of a list simultaneously
    • Data Cleaning: Detecting and handling incomplete, incorrect, inaccurate or irrelevant parts of the data (e.g., missing values, outliers, etc.)
  • Learning from examples: Using online forums and code bases to retrieve programming solutions
  • Applied Statistics:
    • Descriptive Statistics
    • Hypothesis testing (t.test, Wilcoxon), parametric and non-parametric statistics, bootstrap hypothesis testing
    • Basics of Supervised Machine Learning: Linear and logistic Regression* 
    • Basics of Unsupervised Machine Learning: Cluster Analysis *
  • Visualizing Research Results (plot, barplot, , boxplot,…).
  • Building interactive interfaces with Shiny*

* Advanced topics; depend on students’ progress in the course

Learning Outcomes

Upon successful completion of this course, students will be able to use R to analyze and extract insights from structured data-sets, and use R’s visualization capabilities to report and communicate research findings in presentations and papers.

Reading List

Harvard’s edX MOOC on R basics: https://www.edx.org/course/data-science-r-basics-2

Stack Overflow: A Q/A forum for programmers

The R Project for Statistical Computing – the homepage of the R project.

An Introduction for Statistical Learning with Applications in R – A comprehensive Textbook by James, Witten, Hastie and Tibshirani.

Website