HOME | ABOUT ME | LAB | RESEARCH | TEACHING | CV
Intro to Data Analysis
Utah Valley University - BIOL3100
Handy links:
Course file repository
Exams GitHub Repository (Exams will be uploaded at the appropriate times)
R for Data Science Website
Amy Willis’ Intro to R course
(for related alternative exercises/lessons)
Big Book of R collection of free R books …Whoa!
The Command Line, File Paths, Git
Week 1
Topics:
- Installing Software | Command-line | Git version control
Assignments
- Read: What is Git all about?
- Install Git, R, and R-Studio on your laptop (part of Assignment 1)
- Be ready to explain what Git, R, and R-Studio are.
- Do Assignment 1 and upload a link to your new GitHub account to Canvas.
- Take a look at this document to see where this class is going
- Go through ALL the resources below. I put them here for a reason. Most are short web resources or videos (some that I made).
Resources
- Video: Meet the command line of your computer
- Download Git
- Download R
- Download R-Studio
- Navigating with the command line (great video!)
- Setting up 2FA for GitHub
- GitHub steps for Assignment 1 video
- Git Cheat Sheet (handy reference)
- Git tutorial for beginners (another walkthrough, if you need it)
- Best Git Cheat Sheet Ever! (Excellent gift idea for a teacher…joking!)
Practice
- Make 10 more separate changes and commits to your README.md file and push each one to GitHub
- Close and open your command line terminal 10 times
- Open your command line terminal and navigate to your new personal GitHub repository for this course (Data_Course_LASTNAME) / Navigate back to your desktop / From your Desktop (without using “cd”) display the contents of Data_Course_LASTNAME/README.md onto your computer screen.
- Please view this short video clip from “Karate Kid” (Seriously)
- When I tell you to close and open your command line 10 times, it’s not because I hate you.
- It’s because I, too, have had to learn this stuff from scratch
- It’s because I know that repetition is crucial to learning this, especially at the beginning
- And it’s because if you don’t spend the time to do this stuff over and over now, by week 6 you will be drowning and helpless.
- When I say “push 10 separate commits to your GitHub repo,” what I’m actually saying is “Show me ‘Paint the Fence’!”
- Because very soon, Mr. Miyagi will be attacking you with things like “Error in url[i] = paste(df[,2], gsub(” “,”_“, : object of type ‘closure’ is not subsettable”
Week 2
Topics
- File paths | Wildcards and pattern matching | Objects | For-Loops
Assignments
Resources
- Intro to what RStudio is video
- Paths and files in R
If you want to know more about the command-line
- Remedial Unix Shell
- Basic Unix Commands
- Very Useful Tutorial
- On the Value of Command-Line Bullshittery
- On the Annoyance of Command-Line Bullshittery
- Video walkthroughs of some command line stuff:
- Part 1 - first commands
- Part 2 - pipes and wildcards
- Part 3 - relative filepaths
- Command line program flags/parameters
- How to avoid two potentially dangerous command line errors
- For-loops video walkthrough in BASH
- Bonus tips:
Practice
- In the directory Data_Course/Data/data-shell/names/ there are a number of subdirectories and csv files. Find all of those csv files and store their full absolute filepaths as a character vector in R.
- Read in and print just the first 2 lines from each of those files
- Find all the .txt files on your entire computer
- Find all files on your computer that contain the character string “es” in the filename
Getting to Know R
Week 3
Topics
- R Data types and conversions | Reading and Writing Files | Packages and Projects
Assignments
- Read this chapter on what a “package” is in R
- Read this chapter on R-Projects (We will ALWAYS work from within R-Projects from now on)
- Do Assignment 3 (We will start this one together during class)
Resources
- Data Types in R
- Operators
- Subsetting
- More Subsetting (It’s important!)
- For-Loops
- Reading data into R
- Using ‘pipes’ in R chapter
- For-loops in R chapter
Practice
- Vectors
- Factors
- Characters
- Regular Sequences
- Indexing
- Missing Values
- Loops in R
- Logical Operations
- Find a new built-in data set in R. Use several methods to subset it over and over until you are an expert!
- Out-of-order Code
Visualizing a Data Set
Week 4
Topics
- “Grammar of Graphics” ggplot | dplyr verbs
Assignments
- Read through the materials in the Resources section below
- Do Assignment 4
Exam 1 (Link at top of page)
Resources
- ggplot Introduction
- dplyr Verbs
- tidyverse Cheat Sheet
- More ggplot
- Extensive ggplot2 Tutorial
- How to plot anything in ggplot part 1 | part 2 (This is a GOOD thing to watch!)
- Evolution of a ggplot tutorial
- Catalog of visualization types (Awesome source of inspiration for your plots…but memberhip fee required to see code. Booooo)
Practice
- Out-of-order Plotting Code
- ggplot Shiny App Lets you use GUI to see ggplot code.
- Convert the following code expressions into “pipe format” to make them more readable:
unique(stringr::str_to_title(iris$Species))
max(round(iris$Sepal.Length),0)
mean(abs(rnorm(100,0,5)))
median(round(seq(1,100,0.01),1))
Week 5
Topics
- More ggplot | ggplot extensions
Assignments
- Assignment 5 - Ugly plot contest!
- Prevent embarrassment… see how NOT to make a chart, except for the Ugly Plot Contest, of course, where you should try to upend good sense.
Resources
- ggimage package
- ggforce package
- patchwork package
- ggpubr package
- gganimate package
- Awesome curated list of ggplot extensions (an overwhelming amount of resources, but if you need something it is probably in here)
Practice
Clean and Transform Data
Week 6
Topics
- Tidy Data | dplyr verbs | tidyr verbs
Assignments
- Read this paper: Tidy Data
- Assignment 6
Resources
Practice
Week 7
Topics
- Data Wrangling | Joins | The Curse of Other Peoples’ Data
Assignments
- Read This Handout
- Read This Paper
- Assignment 7
- Be prepared to discuss your data set for your final project
Resources
- Data Wrangling Chapter
- Wrangling Cheat Sheet
- Visual Explanations of Joins
- Janitor package on CRAN
- Rstats illustrations
- Working with strings and regular expressions using the stringr package
- How Excel actually killed people news article
Practice
- Download this spreadsheet. See if you figure out all the things wrong with it.
- Error Sleuth Practice
- Data Entry Case Study
Getting More From R
Week 8
Topics
- Writing Functions | Conditional Execution | source()
Assignments
- Watch this video from Jenny Bryan about debugging
- Read this chapter and do all the exercises in it as you read
Exam 2 (Link at top of page)
Resources
- Functions
- Conditional Execution
- Functionals and the purrr package chapter
Practice
- Write a function that returns the min, max, and mean of any set of real numbers
- Write a function that takes a data frame and returns a new data frame with one random column removed
- Fix my out-of-order code for a summarizing function
- Write a function that takes a data frame… if there are more than 3 columns, your function should return the column names as-is; if there are 3 or fewer columns, your function should return the column names in reverse order.
- Write a useful function that you might want to use in the future (your choice)
- Put all of these functions into a new R script and save it in your main data course repository
- In a new empty R script, call your functions with source() and test them out
- There’s a stupid function I wrote in “/Code_Examples/thlayli.R”
- It takes a data.frame as an input and does WHAT to it?
Model Building and Testing
Week 9
Topics
- Building and Testing Models
Assignments
Resources
- Recorded lesson video (Part 1 - Intro to linear models)
- What is a statistical model?
- Modeling Intro
- Model Basics
- Model Fitting
- Model Performance
- Interpreting models with easystats
- Machine learning models explained
Practice
Week 10
Topics
- More models | Statistical Tests
Assignments
- Show up to class. Models are confusing at first and there’s a lot to learn.
- Ask questions during class.
Resources
- Linear Regression
- More Linear Regression
- Common Statistical Tests
- Most stats test are really just linear regression models!
- Everything is a Regression
- Mixed-effect Models
- Introduction to mixed effects models This is a VERY good paper!
- How to use lmer
- Comparing multiple models with regression tables
- Intro to Machine Learning online text
- The best way to get into machine learning with R is with the tidymodels package ecosystem
- Free book on tidy modeling with R
- See also the tidyclust package for clustering algorithms
- Free case studies using Machine Learning in R
- Awesome student-made repository featuring good explanations of different GLAMM models
Practice
- Go through the R script more_models.R
- Follow along with my analyses of the first two data sets
- Complete an analysis of the third data set
Communicating Your Results
Week 11
Topics
- R-Markdown | Reproducible Reports
Assignments
Resources
- Intro to R Markdown
- Markdown Live Preview Generator
- Expert-Level Markdown Project
- Example data analysis webpage
Practice
- Using the resources above, generate a markdown document that analyzes the “iris” data set and push it to a new GitHub repository named Iris_Markdown
- Play with options and code to create a document that looks good and presents your analysis and results clearly
- This is similar to Assignment_9, but I’m asking for a brand new “Iris_Markdown” repository that is a self-contained report of Iris analyses
Week 12
Topics
- Proper Project Organization | “There’s an R-Package for everything”
Assignments
- Peer evaluation of Assignment 9 HTML reports (Organization, Portability, Accuracy, Understandablity)
Exam 3 (Link at top of page)
Resources
- Recorded lesson video (Part 1)
- Project-oriented Workflows
- Reproducible, Portable, Self-Contained
- Proper Project Organization Example
- Another, More Detailed Example
- Project organization part 1 video
- Project organization part 2 video
Practice
- Peer evaluations of Iris_Markdown repositories (from last week); Clean them up and make them more organized
Putting it all together
Week 13
Topics
- Data Analysis from raw to report
Assignments
- We will work together in class to do a complete analysis in real-time
- The rest of the semester will focus on live-coding as we work on your final projects
Resources
Week 14
Topics
- Building a website with GitHub and R-Markdown
Assignments
- Work on Final Project
- Create a GitHub Personal Website
- Upload a brief CV and the updated (improved) html of Assignment 9 to your new website
Resources
- Recorded lesson video (Recorded during class and posted after)
- GitHub Pages
- Here’s the GitHub repository for this course website
- Rmarkdown to web page walkthrough video
- Reproducible workflow video
Practice
- Go through my course website repository (link above) and try to relate the code there to the html version of the website your internet browser displays
- Work on your personal website:
- Add multiple pages with internal links
- Be sure to have a “Projects” page that links to HTML reports you’ve made, including your final project
- Be careful not to push any files larger than 50Mb to GitHub or it will break your repository!
Week 15
Topics
- Intro to genetic data in R
Assignments
- Work on Final Project
- Assignment 10 (Draft of final code)
Week 16
Topics
- TBD
Assignments
Exam 4 (Redo any previous exam to replace it’s score)
‘Luck is statistics taken personally.’ – Penn Jillette