Introduction to Data Analysis in R

Course Syllabus - Spring 2024


Instructor: Dr. Geoff Zahn

Office: SB242t

Course GitHub Repository: https://github.com/gzahn/Data_Course

Course Website: https://gzahn.github.io/data-course/

You need to bring your own laptop to class every day. That way you have installation privileges and you will be set up for the future as you continue to analyze data.


Course Description

This course provides an introduction to analyzing data in the R software environment, assuming no previous R experience.

We cover:

  • the justification of using code to explore and analyze data
  • best practices for dealing with data
  • experimental design, modeling, and hypothesis testing
  • how to create publication-quality figures that show interesting relationships in our data sets
  • project organization and proper reporting

Pre-requisites: Advanced standing or instructor permission

Learning outcomes Students completing this course should be able to:

  1. Demonstrate proficiency in proper data entry, management, and storage with an emphasis on reproducibility.
  2. Convert untidy data to “tidy data” for analyses.
  3. Explain the basic principles of exploratory data analyses within a computational software environment.
  4. Evaluate the rationale behind using code to analyze data and present results.
  5. Develop computational skills for processing common biological data formats, such as DNA sequences.
  6. Create appropriate and meaningful data visualizations using the R software environment.
  7. Integrate principles of experimental design, statistical modeling, hypothesis testing, and data visualization

Advice

This course requires significant outside work… meaning that if you aren’t practicing at least 6 hours a week outside of class, then you are going to have a really rough time of it. Like, you’ll fall behind and will be frustrated and scared and sad. Don’t let this happen. Practice! Come to my office! Ask questions! And spend time every day using R. Seriously. Every day.

The course website has external resources and practice tasks for every week. Use these to make the most of this opportunity. Data fluency is currently one of the most desirable job skills, including in biology. Most undergrad science students do not learn these skills, putting you at a major competitive advantage if you make the effort to learn this.

Computers are not a passing fad. Don’t get left behind.


Grading

We will use a point system. The points you have accumulated by the end of the course determine your grade as follows:

Points Letter Grade
700-800 A
640-699 B
560-639 C
480-559 D
<480 E

Points are based on:

  • 10 Assignments - 20 pts each
  • 4 Skills Tests - 100 pts each
  • 1 Final Project - 200 pts

Assignments

These are shown on the course website. They (along with all necessary data and files) are also avalable in their respective directories on the GitHub repository. They will generally consist of requiring you to complete a task using R and to upload your code as an R script to Canvas and/or GitHub. Some may vary. These assignments will not be accepted for credit after the due date.

Skills Tests

These will be similar to the sorts of tasks on the precedent assignments, and they will be open-source (you can use notes, internet, etc.). They focus on completing some data analysis tasks.

Final Project

Beginning early in the semester, we will decide on individual projects based on personal interest. Working with instructor feedback, you will come up with a question that interests you and will identify a data set that can address that question. You will then apply the data exploration and visualization skills you learn in class to prepare a well-formatted report that contains all the code and results of your analyses. You are encouraged to use your own data if you have any. Most importantly this project will require you to teach yourself new R skills using the resources we learn about (or any others). You will be required to learn analyses explicitly not covered in this course (Check the course website to see what we are covering), and to apply them correctly to your analysis and report.

Examples of unique topics could include, but are not limited to:

  • Time-series analyses
  • Genomic Profiling
  • New statistical testing methods
  • Metagenomic assembly
  • Learning a new R package
  • Unique visualization methods
  • Etc.

Your topic/question is due before Skills Test 1

Soon after, you will work with the instructor to identify a suitable publicly-available data set (or use your own)

Then you will work with the instructor to identify a new skill to be applied for your project

Your project will culminate in a reproducible report that integrates your background research, code, results, figures, and discussion into a single reproducible document on a personal website that you will create