HOME | ABOUT ME | LAB | RESEARCH | TEACHING | CV

Intro to Data Analysis

Utah Valley University - BIOL3100

The Command Line, File Paths, Git

Week 1

Topics:

  • Installing Software | Command-line | Git version control

Assignments

  • Read: What is Git all about?
  • Install Git, R, and R-Studio on your laptop (part of Assignment 1)
  • Be ready to explain what Git, R, and R-Studio are.
  • Do Assignment 1 and upload a link to your new GitHub account to Canvas.
  • Take a look at this document to see where this class is going
  • Go through ALL the resources below. I put them here for a reason. Most are short web resources or videos (some that I made).

Resources

Practice

  • Make 10 more separate changes and commits to your README.md file and push each one to GitHub
  • Close and open your command line terminal 10 times
  • Open your command line terminal and navigate to your new personal GitHub repository for this course (Data_Course_LASTNAME) / Navigate back to your desktop / From your Desktop (without using “cd”) display the contents of Data_Course_LASTNAME/README.md onto your computer screen.
  • Please view this short video clip from “Karate Kid” (Seriously)
    • When I tell you to close and open your command line 10 times, it’s not because I hate you.
    • It’s because I, too, have had to learn this stuff from scratch
    • It’s because I know that repetition is crucial to learning this, especially at the beginning
    • And it’s because if you don’t spend the time to do this stuff over and over now, by week 6 you will be drowning and helpless.
    • When I say “push 10 separate commits to your GitHub repo,” what I’m actually saying is “Show me ‘Paint the Fence’!”
    • Because very soon, Mr. Miyagi will be attacking you with things like “Error in url[i] = paste(df[,2], gsub(” “,”_“, : object of type ‘closure’ is not subsettable”

Back to top of page


Week 2

Topics

  • File paths | Wildcards and pattern matching | Objects | For-Loops

Assignments

Resources

If you want to know more about the command-line

Practice

  • In the directory Data_Course/Data/data-shell/names/ there are a number of subdirectories and csv files. Find all of those csv files and store their full absolute filepaths as a character vector in R.
  • Read in and print just the first 2 lines from each of those files
  • Find all the .txt files on your entire computer
  • Find all files on your computer that contain the character string “es” in the filename

Back to top of page

Getting to Know R

Week 3

Topics

  • R Data types and conversions | Reading and Writing Files | Packages and Projects

Assignments

  • Read this chapter on what a “package” is in R
  • Read this chapter on R-Projects (We will ALWAYS work from within R-Projects from now on)
  • Do Assignment 3 (We will start this one together during class)

Resources

Practice

Back to top of page


Visualizing a Data Set

Week 4

Topics

  • “Grammar of Graphics” ggplot | dplyr verbs

Assignments

  • Read through the materials in the Resources section below
  • Do Assignment 4

Resources

Practice

unique(stringr::str_to_title(iris$Species))
max(round(iris$Sepal.Length),0)
mean(abs(rnorm(100,0,5)))
median(round(seq(1,100,0.01),1))

Back to top of page

Week 5

Topics

  • More ggplot | ggplot extensions

Assignments

Resources

Practice

Back to top of page

Clean and Transform Data

Week 6

Topics

  • Tidy Data | dplyr verbs | tidyr verbs

Assignments

Resources

Practice

Back to top of page


Week 7

Topics

Assignments

Resources

Practice

Back to top of page

Getting More From R

Week 8

Topics

  • Writing Functions | Conditional Execution | source()

Assignments

  • Watch this video from Jenny Bryan about debugging
  • Read this chapter and do all the exercises in it as you read

Resources

Practice

  • Write a function that returns the min, max, and mean of any set of real numbers
  • Write a function that takes a data frame and returns a new data frame with one random column removed
  • Fix my out-of-order code for a summarizing function
  • Write a function that takes a data frame… if there are more than 3 columns, your function should return the column names as-is; if there are 3 or fewer columns, your function should return the column names in reverse order.
  • Write a useful function that you might want to use in the future (your choice)
  • Put all of these functions into a new R script and save it in your main data course repository
  • In a new empty R script, call your functions with source() and test them out
  • There’s a stupid function I wrote in “/Code_Examples/thlayli.R”
    • It takes a data.frame as an input and does WHAT to it?

Back to top of page

Model Building and Testing

Week 9

Topics

  • Building and Testing Models

Assignments

Resources

Practice

Back to top of page


Week 10

Topics

  • More models | Statistical Tests

Assignments

  • Show up to class. Models are confusing at first and there’s a lot to learn.
  • Ask questions during class.

Resources

Practice

  • Go through the R script more_models.R
    • Follow along with my analyses of the first two data sets
    • Complete an analysis of the third data set

Back to top of page

Communicating Your Results

Week 11

Topics

  • R-Markdown | Reproducible Reports

Assignments

Resources

Practice

  • Using the resources above, generate a markdown document that analyzes the “iris” data set and push it to a new GitHub repository named Iris_Markdown
  • Play with options and code to create a document that looks good and presents your analysis and results clearly
  • This is similar to Assignment_9, but I’m asking for a brand new “Iris_Markdown” repository that is a self-contained report of Iris analyses

Back to top of page


Week 12

Topics

  • Proper Project Organization | “There’s an R-Package for everything”

Assignments

  • Peer evaluation of Assignment 9 HTML reports (Organization, Portability, Accuracy, Understandablity)

Resources

Practice

  • Peer evaluations of Iris_Markdown repositories (from last week); Clean them up and make them more organized

Back to top of page

Putting it all together

Week 13

Topics

  • Data Analysis from raw to report

Assignments

  • We will work together in class to do a complete analysis in real-time
  • The rest of the semester will focus on live-coding as we work on your final projects

Resources

Back to top of page


Week 14

Topics

  • Building a website with GitHub and R-Markdown

Assignments

  • Work on Final Project
  • Create a GitHub Personal Website
  • Upload a brief CV and the updated (improved) html of Assignment 9 to your new website

Resources

Practice

  • Go through my course website repository (link above) and try to relate the code there to the html version of the website your internet browser displays
  • Work on your personal website:
    • Add multiple pages with internal links
    • Be sure to have a “Projects” page that links to HTML reports you’ve made, including your final project
    • Be careful not to push any files larger than 50Mb to GitHub or it will break your repository!

Back to top of page


Week 15

Topics

  • Intro to genetic data in R

Assignments

Back to top of page


Week 16

Topics

  • TBD

Assignments

  • Exam 4 (Redo any previous exam to replace it’s score)

Back to top of page



‘Luck is statistics taken personally.’ – Penn Jillette