Usually, the data you analyze will be sitting in an excel or csv file. You will have to find it and import it into R using code. Also, you will probably need to output things from R like statistical tables, graphs, or cleaned data into new files. All of these tasks require you to be able to navigate your computer storage. Let’s take a look at a few things we can do within R:
This command shows us the “path” to our current working directory. A path is like a set of directions for how to get from the very root of your computer to the current folder you’re working in
getwd()
## [1] "/home/gzahn/Desktop/GIT_REPOSITORIES/gzahn.github.io/data-course/Repository/Code_Examples"
“/” represents the root of my entire filesystem, and each slash represents a new subdirectory.
You can list the files in my current working directory with the following:
list.files()[1:10] # just the first 10 to save space
## [1] "animation.html" "animation.Rmd"
## [3] "assign_letter_grades.R" "badplot.jpg"
## [5] "better_than_excel.R" "building_basic_models.R"
## [7] "cleaning_bird_data.R" "custom_images_for_Points.R"
## [9] "dada2_R_example_script.R" "DNA_packages.R"
That command can be modified as well:
list.files(pattern = "x") # just filenames that have "x" in them
## [1] "better_than_excel.R" "dada2_R_example_script.R"
## [3] "exam2_review.R" "Example_day1"
## [5] "Example_Project" "excel_instructions.txt"
## [7] "function_example.R" "handy_bash_aliases.txt"
## [9] "md_example.R" "MeanAD_example.R"
## [11] "plot_examples.R" "plot_examples.Rmd"
## [13] "plots_examples.html" "plots_examples.Rmd"
## [15] "ShortRead_package_example.R" "Vegan_Example"
You can search within any directory on your computer, by telling list.files() which “path” to search in:
list.files(path = "/home/gzahn/Desktop/Bioinformatics/")
## [1] "ENTREZ_QIIME" "ENTREZ_QIIME.zip"
## [3] "Fungal_Alignments" "install_old_R_Packages.R"
## [5] "RDP_Training_Set_ITS2_Outgroups.zip"
list.files(path = "/home/gzahn/Desktop/Bioinformatics/",
recursive = TRUE,
pattern = ".nex")
## [1] "Fungal_Alignments/AF48v6dex3.nex"
## [2] "Fungal_Alignments/combined214_nuc.nex"
## [3] "Fungal_Alignments/nuc_5.8S_199_taxa.nex"
## [4] "Fungal_Alignments/nuc_SSU_211_taxa.nex"
Note how “recursive = TRUE” tells it to descend into subdirectories of a given path. Those 4 files live in the “Fungal_Alignments” subdirectory within that path.
Now a closely related function:
mypath <- "~/Desktop/GIT_REPOSITORIES/Data_Course/Data"
list.dirs(path = mypath, recursive = FALSE)
## [1] "/home/gzahn/Desktop/GIT_REPOSITORIES/Data_Course/Data/data-shell"
## [2] "/home/gzahn/Desktop/GIT_REPOSITORIES/Data_Course/Data/Fastq_16S"
## [3] "/home/gzahn/Desktop/GIT_REPOSITORIES/Data_Course/Data/flights"
## [4] "/home/gzahn/Desktop/GIT_REPOSITORIES/Data_Course/Data/Messy_Take2"
You can save this list of directories in case you want to work with it later:
data_directories <- list.dirs(path = mypath, recursive = FALSE)
data_directories[3]
## [1] "/home/gzahn/Desktop/GIT_REPOSITORIES/Data_Course/Data/flights"
list.files(path = data_directories[3],full.names = TRUE)
## [1] "/home/gzahn/Desktop/GIT_REPOSITORIES/Data_Course/Data/flights/2679884.csv"
## [2] "/home/gzahn/Desktop/GIT_REPOSITORIES/Data_Course/Data/flights/2679921.csv"
You can ask questions about whether files or directories exist in a given location:
file.exists("/home/gzahn/Desktop/GIT_REPOSITORIES/Data_Course/Data/flights/2679884.csv")
## [1] TRUE
dir.exists("/home/gzahn/Desktop/GIT_REPOSITORIES/Data_Course/Data/fights") # misspelled "flights"
## [1] FALSE
You can create and modify and peek inside files as well:
list.files(path = data_directories[3],full.names = TRUE)
## [1] "/home/gzahn/Desktop/GIT_REPOSITORIES/Data_Course/Data/flights/2679884.csv"
## [2] "/home/gzahn/Desktop/GIT_REPOSITORIES/Data_Course/Data/flights/2679921.csv"
file.create(file.path(data_directories[3],"testfile")) # Says "TRUE" if it worked
## [1] TRUE
list.files(path = data_directories[3],full.names = TRUE)
## [1] "/home/gzahn/Desktop/GIT_REPOSITORIES/Data_Course/Data/flights/2679884.csv"
## [2] "/home/gzahn/Desktop/GIT_REPOSITORIES/Data_Course/Data/flights/2679921.csv"
## [3] "/home/gzahn/Desktop/GIT_REPOSITORIES/Data_Course/Data/flights/testfile"
# be careful using file.remove() ... it's permanent!
file.remove("/home/gzahn/Desktop/GIT_REPOSITORIES/Data_Course/Data/flights/testfile") # Says "TRUE" if it worked
## [1] TRUE
Here are some other functions you should play with:
file.rename()
file.append()
file.copy()
file.size()
readLines()
Do you know what is going on with the next 4 lines of code?
getwd()
## [1] "/home/gzahn/Desktop/GIT_REPOSITORIES/gzahn.github.io/data-course/Repository/Code_Examples"
list.files()[1:10]
## [1] "animation.html" "animation.Rmd"
## [3] "assign_letter_grades.R" "badplot.jpg"
## [5] "better_than_excel.R" "building_basic_models.R"
## [7] "cleaning_bird_data.R" "custom_images_for_Points.R"
## [9] "dada2_R_example_script.R" "DNA_packages.R"
list.files(path = "..",full.names = TRUE)
## [1] "../Assignments" "../Code_Examples" "../Data" "../Exercises"
## [5] "../Tools"
list.files(path = "../Assignments")
## [1] "Assignment_1" "Assignment_10" "Assignment_2"
## [4] "Assignment_3" "Assignment_4" "Assignment_5"
## [7] "Assignment_6" "Assignment_7" "Assignment_8"
## [10] "Assignment_9" "Assignment_DNA_Trees"
This is all VERY useful once you start working with hundreds or thousands of data files for a given project. If I want to search my entire computer desktop and all the folders inside of it for fasta DNA sequence files and find the ones that match a pattern in naming:
mypath <- "~/Desktop"
fastas <- list.files(mypath,recursive = TRUE,pattern = "*5.8S*.fasta$",full.names = TRUE) # any file that has "5.8S" in the name and ends with ".fasta"
fastas
## [1] "/home/gzahn/Desktop/UVU/Journal_Reviews/WNAN_2021/ITS1_all_5.8S.5_8S.fasta"
## [2] "/home/gzahn/Desktop/UVU/Teaching/Courses/Mycology/5.8S.5_8S.fasta"
Since R did all the searching and saved the location of those files, I can have it automatically read them in and work with them. For example:
fna <- ShortRead::readFasta(fastas)
ShortRead::sread(fna)
## DNAStringSet object of length 213:
## width seq
## [1] 158 AAACTTTCAACAACGGATCTCTTGGTTCTGGCA...TTCCGGGGGGCATGCCTGTTCGAGCGTCATTG
## [2] 158 AAACTTTCAACAACGGATCTCTTGGTTCTGGCA...TTCCGGGGGGCATGCCTGTTCGAGCGTCATTA
## [3] 352 AAACTTTCAACAACGGATCTCTTGGCTCTGGCA...CGGATCAGGTAGGGATACCCGCTGAACTTAAG
## [4] 158 AAACTTTCAACAACGGATCTCTTGGTTCTGGCA...TTCCGGGGGGCATGCCTGTTCGAGCGTCATTA
## [5] 158 AAACTTTCAACAACGGATCTCTTGGTTCTGGCA...TTCCGGGGGGCATGCCTGTTCGAGCGTCATTA
## ... ... ...
## [209] 158 AAACTTTCAACAACGGATCTCTTGGTTCTGGCA...TTCCGGAGGGCATGCCTGTCCGAGCGTCATTA
## [210] 158 AAACTTTCAACAACGGATCTCTTGGTTCTGGCA...TTCCGGAGGGCATGCCTGTTCGAGCGTCATTA
## [211] 158 AAACTTTCAACAACGGATCTCTTGGTTCTGGCA...TTCCGAAGGGCATGCCTGTTCGAGCGTCATTG
## [212] 158 AAACTTTCAACAACGGATCTCTTGGCTCTGGCA...TTCCGAAGGGCATGCCTGTCCGAGCGTCATTA
## [213] 158 CAACTTTCAACAACGGATCTCTTGGCTCTCGCA...TTCCGGAGGGCATGCCTGTTTGAGTGTCATGT