Data analysis is becoming a fundamental competency in educational research. R is a free programming language for statistical computing, which is very popular among researchers and data scientists for data analysis and visualization. The course will cover the basics of data analysis in R. Evaluation will be based mainly on assignments and a final project.
Below is a list of the main topics that the course will touch upon (not necessarily in this order):
- Preliminaries – the very basics of R:
- Introduction to variables, data structures and their representation in R (variables, Vectors, Lists, Matrices, data frames)
- Input/output: loading data files, saving results to file
- Data Programming:
- Sub-setting and Splitting data
- Control flow: Conditions and loops
- Merging datasets
- The Apply family of functions: Apply a function to all items of a list simultaneously
- Data Cleaning: Detecting and handling incomplete, incorrect, inaccurate or irrelevant parts of the data (e.g., missing values, outliers, etc.)
- Learning from examples: Using online forums and code bases to retrieve programming solutions
- Applied Statistics:
- Descriptive Statistics
- Hypothesis testing (t.test, Wilcoxon), parametric and non-parametric statistics, bootstrap hypothesis testing
- Basics of Supervised Machine Learning: Linear and logistic Regression*
- Basics of Unsupervised Machine Learning: Cluster Analysis *
- Visualizing Research Results (plot, barplot, , boxplot,…).
- Building interactive interfaces with Shiny*
* Advanced topics; depend on students’ progress in the course