Advances in next generation sequencing technologies alongside substantial reductions in the cost of sequencing have made it possible to measure the expression of thousands of genes across thousands of samples in non-model organisms. Evolutionary genomic methods comparing gene expression differences between populations provide a forward genetic approach to identify the genetic basis of divergent phenotypes. The huge amount of data produced by these types of experiments make it impractical to process results on a single desktop machine. Instead, many recent breakthroughs in genomic research were made possible by high performance computer clusters that allow for fast, parallel processing of many samples at once. 
 
 
A computer cluster is a set of connected computers (compute nodes) directed by a head node that runs a centralized management software. Farm is a research and teaching cluster for the College of Agricultural and Environmental Sciences that uses the Slurm workload management software to handle jobs submitted by many of users.
 
 
Here is an example of a job I ran recently using this Slurm script:
#!/bin/bash
#SBATCH --job-name=CA17_angsd_downsample_sfs
#SBATCH --mem=40G
#SBATCH --ntasks=8
#SBATCH -e CA17_angsd_downsample_sfs_%A_%a.err
#SBATCH --time=48:00:00
#SBATCH --mail-user=jamcgirr@ucdavis.edu ##email you when job starts,ends,etc
#SBATCH --mail-type=ALL
#SBATCH -p high
shuf /home/jamcgirr/ph/data/angsd/SFS/bamlist_test/CA17_bams_p_1_5_rm.txt | head -41 > /home/jamcgirr/ph/data/angsd/SFS/downsample/downsample_bams_CA17.txt
/home/jamcgirr/apps/angsd_sep_20/angsd/angsd -bam /home/jamcgirr/ph/data/angsd/SFS/downsample/downsample_bams_CA17.txt -doSaf 1 -doMajorMinor 1 -doMaf 3 -doCounts 1 -anc /home/jamcgirr/ph/data/c_harengus/c.harengus.fa -ref /home/jamcgirr/ph/data/c_harengus/c.harengus.fa -minMapQ 30 -minQ 20 -GL 1 -P 8 -uniqueOnly 1 -remove_bads 1 -only_proper_pairs 1 -trim 0 -C 50 -minInd 10 -setMinDepth 10 -setMaxDepth 100 -out /home/jamcgirr/ph/data/angsd/SFS/downsample/CA17_minQ20_minMQ30
/home/jamcgirr/apps/angsd_sep_20/angsd/misc/realSFS /home/jamcgirr/ph/data/angsd/SFS/downsample/CA17_minQ20_minMQ30.saf.idx -P 8 -fold 1 -nSites 100000000 > /home/jamcgirr/ph/data/angsd/SFS/downsample/CA17_minQ20_minMQ30_folded.sfs
/home/jamcgirr/apps/angsd_sep_20/angsd/misc/realSFS /home/jamcgirr/ph/data/angsd/SFS/downsample/CA17_minQ20_minMQ30.saf.idx -P 8 -nSites 100000000 > /home/jamcgirr/ph/data/angsd/SFS/downsample/CA17_minQ20_minMQ30_unfolded.sfs
#run: sbatch script_CA17_angsd_downsample_sfs.sh
sinfo commandsinfo  -o "%12P %.5D %.4c %.6mMB %.11l"PARTITION    NODES CPUS MEMORYMB   TIMELIMIT
low2            34  64+ 256000MB  1-00:00:00
med2            34  64+ 256000MB 150-00:00:0
high2           34  64+ 256000MB 150-00:00:0
low            101   24  64300MB    13:20:00
med*           101   24  64300MB 150-00:00:0
high           101   24  64300MB 150-00:00:0
bigmeml          9  64+ 480000MB 150-00:00:0
bigmemm          9  64+ 480000MB 150-00:00:0
bigmemh          8  64+ 480000MB 150-00:00:0
bigmemht         1   96 970000MB 150-00:00:0
bit150h          1   80 500000MB 150-00:00:0
ecl243           1   80 500000MB 150-00:00:0
bml             18   96 480000MB 150-00:00:0
bmm             18   96 480000MB 150-00:00:0
bmh             18   96 480000MB 150-00:00:0
bgpu             1   40 128000MB 150-00:00:0
bmdebug          2   96 100000MB 3600-00:00:
gpuh             2   48 768000MB  7-00:00:00
gpum             2   48 768000MB  7-00:00:00
ecl243 partition
 
 
 
 
 
 
 
 
 
 
Today we will learn how to use these resources to run the first step in an RNAseq pipeline. We are going to play with gene expression data from my graduate work with Cyprinodon pupfishes.
 
 
 
mkdir fastqs
mkdir scripts
 
 
Open up another terminal so that you can interact with your local files
There are several ways to transfer files between farm and your computer. We will be using scp
Note: If you find yourself doing a lot of transferring in the future for your project, check out:
slurm_template.sh from Slack and use scp command to upload to farmWhenever you see these symbols ‘< >’ that means you need to change what I have written
scp -P 2022 <path/to/>slurm_template.sh <username>@farm.cse.ucdavis.edu:~/scripts.fastq files with wgetThe files we will trim are uploaded to on my github. These are 150bp paired-end illumina reads.
Since we are just downloading small files with wget, we can run this on the head node
cd fastqs/
wget https://github.com/joemcgirr/joemcgirr.github.io/raw/master/tutorials/farm_slurm/CPE1_R1.fastq
wget https://github.com/joemcgirr/joemcgirr.github.io/raw/master/tutorials/farm_slurm/CPE1_R2.fastq
 
 
module availmodule avail----------------------------- /share/apps/modulefiles/lang -----------------------------
aocc/2.1.0          intel/2013         julia/0.6.2  perlbrew/5.16.0  python3/3.6.1
gcc/4.5             intel/2019         julia/0.7.0  pgi/13.3         python3/3.7.4
gcc/4.7.3           ipython2/2.7.16    julia/1.0.0  pgi/13.4         python3/system
gcc/4.9.3           ipython3/3.6.9     julia/1.0.3  proj/4.9.3       R/3.6
gcc/5.5.0           java-jre/1.8.0_20  julia/1.1.0  proj/7.0.1       R/3.6.2
gcc/6.3.1           java/1.8           julia/1.1.1  python/2.7.4     R/3.6.3(default)
gcc/7.2.0           jdk/1.7.0_79       julia/1.2.0  python/2.7.6     R/4.0.2
gcc/7.3.0(default)  jdk/1.8.0.31       julia/1.3.0  python/2.7.14    tools/0.2
gcc/9.2.0           jdk/1.8.0.121      julia/1.3.1  python/2.7.15    udunits/2.2.2
golang/1.13.1       julia/0.6.0        julia/1.4.2  python2/system
----------------------------- /share/apps/modulefiles/hpc ------------------------------
a5miseq/0                        masurca/2.3.1                 velvet/1.2.10
a5miseq/20160825                 masurca/2.3.2                 ViennaRNA/2.1.8
a5pipeline/20130326              masurca/3.1.3                 ViennaRNA/2.4.11
abblast/4Jan2019                 matlab/1-2019a                VirtualGL/2.6.2
abyss/1.3.5                      matlab/7.11                   VirusDetect/1.7
abyss/1.5.1                      matlab/7.13                   vsearch/1.10.1
abyss/1.5.2                      matlab/2016b                  WASP/0.3.4
abyss/1.9.0                      matlab/2017a                  WinHAP2/1
AGOUTI/0.3.3                     matlab/2018b                  wise/2.2.3-rc7
aksrao/3.0                       matplotlib/2.0                wrf/4.0
trim_galore
 
 
cd scripts/
cat slurm_template.sh#!/bin/bash
#SBATCH --job-name=   # create a short name for your job
#SBATCH --ntasks=     # total number of tasks across all nodes
#SBATCH --mem=        # memory to allocate
#SBATCH --time=       # total run time limit (HH:MM:SS)
#SBATCH --partition=  # request a specific partition for the resource allocation
#SBATCH --error       # create a file that contains error messages
#SBATCH --mail-type=  # send email when job begins and ends
#SBATCH --mail-user=email@ucdavis.edu 
slurm_template.sh to match what is shown below and save it as trim_galore.sh#!/bin/bash
#SBATCH --job-name=trim_galore        # create a short name for your job
#SBATCH --ntasks=1                    # total number of tasks across all nodes
#SBATCH --mem=8G                      # memory to allocate
#SBATCH --time=00:01:00               # total run time limit (HH:MM:SS)
#SBATCH --partition=ecl243            # request a specific partition for the resource allocation
#SBATCH --error trim_galore_%A_%a.err # create a file that contains error messages
#SBATCH --mail-type=ALL               # send email when job begins and ends
#SBATCH --mail-user=<email>@ucdavis.edu
module load trim_galore
trim_galore -q 20 --paired ~/fastqs/CPE1_R1.fastq ~/fastqs/CPE1_R2.fastq
 
 
sbatchsbatch trim_galore.shsqueuesqueue -u usernameJOBID PARTITION     NAME     USER ST        TIME  NODES CPU MIN_ME NODELIST(REASON)
29741087    ecl243 trim_gal ecl243-0  R        0:10      1 1   8G     bigmem9scancelscancel JOBIDA file will appear in the directory containing your trim_galore.sh that looks like slurm-JOBID.out
This file will contain anything that is written to standard out by trim galore, along with information about your job.
cat slurm-<JOBID>.out==========================================
SLURM_JOB_ID = 29741087
SLURM_NODELIST = bigmem9
==========================================
1.15
Name                : trim_galore
User                : ecl243-06
Partition           : ecl243
Nodes               : bigmem9
Cores               : 1
GPUs                : 0
State               : COMPLETED
Submit              : 2021-01-22T16:46:26
Start               : 2021-01-22T16:46:26
End                 : 2021-01-22T16:46:49
Reserved walltime   : 00:01:00
Used walltime       : 00:00:23
Used CPU time       : 00:00:02
% User (Computation): 88.00%
% System (I/O)      : 11.95%
Mem reserved        : 8G/node
Max Mem used        : 0.00  (bigmem9)
Max Disk Write      : 0.00  (bigmem9)
Max Disk Read       : 0.00  (bigmem9)Another file will appear that looks like this trim_galore_JOBID_TASKID.err
head trim_galore_<JOBID_TASKID>.errModule perlbrew/5.16.0 loaded
 Please be sure your perl scripts hashbang line is #!/usr/bin/env perl
Module trim_galore/1 loaded
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)
Path to Cutadapt set as: 'cutadapt' (default)
Cutadapt seems to be working fine (tested command 'cutadapt --version')
AUTO-DETECTING ADAPTER TYPECPE1_*.fastq_trimming_report.txt files
 
 
 
 
 
 
 
 
 
 
Multiqc is a really awesome quality control tool that can recognize summary files produced by popular bioinformatics software. You simply run multiqc . and the program will look through the current directory to find summary files and produce interactive plots that can be viewed in .html.
Interactive sessions let you move to a compute node where you can test commands and run short jobs. You need to be logged on while the entire time a job runs.
Submitting jobs allows you to log off of the cluster and enjoy your day while your job runs.
!!!!!! BUT FIRST !!!!!!
srunsrun -p ecl243 --mem 8G -c 4 -t 00:10:00 --pty bashcd
module load multiqc
multiqc .multiqc_report.html file to a local directoryscp -P 2022 <username>@farm.cse.ucdavis.edu:multiqc_report.html path/to/Downloads/ 
 
 
 
 
https://github.com/RILAB/lab-docs/wiki/Using-Farm