sample_01_R1.fasta
is a good nameSample1R1.fasta
is a bad namesample 1 forwardreads.fasta
is unacceptable and will
get you kicked out of classEvery project you work on should strive to be self-contained. It should also document itself.
See this link for a list of special characters.
If any file name or pattern you’re trying to match contains one of these characters and you want it to be interpreted literally, you need to include a back-slash right in front of the character.
For example, the file Sample 1.fasta
has a space. That’s
a special character! So to point to this file in a script, you would
have to specify Sample\ 1.fasta
which interprets the space
as literally just a space character instead of its special meaning.
If your search string includes a back-slash for some insane but
likely reason, you’d have to escape it as well:
File\1.fasta
can only be addressed as
File\\1.fasta
All terminal information is text-based and flows through 3 main ‘streams.’ These streams have names and numbers:
(You can have other streams, but only need them in special scenarios)
<stdin> The input text. Can come from keyboard entry, a file, or another program. This is what is typically fed into a program. AKA: 0>
<stdout> The output from a program. Typical default is to send it to the terminal screen. AKA: 1>
<stderr> The error info (if it exists). AKA: 2>
Streams can be redirected in various ways to make them more useful.
> This redirects the <stdout> to a file (overwrites any existing file). Equivalent to 1>
>> This redirects the <stdout> to a file (appends to bottom of any existing file). Equivalent to 1>>
| This redirects the <stdout> to a program. You can chain together many programs to process subsequent outputs in turn… a pipeline.
2> This redirects the <stderr> to a file (overwrites)
3> This redirects the <a-third-secret-stream> to a file (overwrites)
These wildcards and special patterns ARE DIFFERENT FROM standard Bash special characters and they are used in Regular Expressions for pattern matching…
symbol | meaning |
---|---|
* | matches 0 to infinite of ANYTHING |
? or . depending on regex flavor | matches up to one instance of any character |
[1,2,3] | matches exactly one instance of 1, 2, OR 3 |
x{4} | the preceding character will be matched exactly 4 times |
^ | matches BEGINNING OF LINE |
[^a] | matches any character EXCEPT ‘a’ |
$ | matches END OF LINE |
\ | ‘escape character’ any special character will be interpreted literally instead of its special meaning |
In Bash, $ denotes ‘the value of a variable’
Thus, you have a variable named PATH, and $PATH is the literal contents of that variable. Your $PATH contains a colon-separated list of paths (locations) in your computer that Bash automatically knows about. Any programs (like ls, grep, etc.) that live somewhere in your $PATH can be executed from anywhere on your computer.
If you want to be able to call up a program like seqtk, once you download it, you need to either move it into an existing PATH directory, or add its location to your $PATH.
If you want to add a new directory to your PATH: See here
Fasta (usually denoted by .fna .fa .fasta extension)
These ought to follow a consistent every-other-line format, but in practice sometimes the sequence lines are split over several lines… because life isn’t fair
>name
Sequence
Fastq (usually denoted by .fnq .fq .fastq extension)
These ought to follow a standard every-four-lines pattern.
@name
sequence
+
quality scores
Quality score interpretation:
The fact that sequence header (names) lines start with ‘@’ paired with the fact that ‘@’ is a valid quality score makes counting sequences more difficult than with a fasta file.
seqtk – This “sequence toolkit” contains a crapload of useful stuff for working with fasta and fastq files
import newimage.png
(Gives a screenshot tool)