For the remainder of the notebook, you will be assigned one or more of the following real files from our dataset:
SRA Sample | Sample Name | File Name |
---|---|---|
SRS1794108 | High-Fat Diet Control 1 | SRR5017135.fastq.gz |
SRS1794110 | High-Fat Diet Control 2 | SRR5017137.fastq.gz |
SRS1794106 | High-Fat Diet Control 3 | SRR5017133.fastq.gz |
SRS1794105 | High-Fat Diet Tumor 1 | SRR5017132.fastq.gz |
SRS1794101 | High-Fat Diet Tumor 2 | SRR5017128.fastq.gz |
SRS1794111 | High-Fat Diet Tumor 3 | SRR5017138.fastq.gz |
In the explanations of the steps we will first use the file small.fastq.gz
. Then in the exercises portion of the notebook, you will run the analysis on your assigned file(s)
Now, we are ready to analyze our data. Move into the ‘seqData’ directory in the project folder, this will keep all of our results together, instead of each running it in our own home folder.
Now let's check the content of our fastq
folder - these are the pre-imported files we want to do quality checks on.
ls /home/gea_user/data/raw_data/fastq
As you can see, we have 6 fastq files to analyze. To get the data, we will use a program called fastqc. To run the program we use the command fastqc
and the name of the file we wish to analyze. We will move into the directory where are data are stored.
cd /home/gea_user/data/raw_data/fastq
We will analyze one file to get familiar with the fastqc output (this file is almost 5GB so this may take a few minutes to complete):
fastqc small.fastq.gz
Our output is returned in two files:
ls *.html && ls *.zip
We can make a new directory to place these results in
mkdir -p /home/gea_user/rna-seq-project/fastqc-untrimmed-results
Now let's move these results to the new directory
mv *.zip /home/gea_user/rna-seq-project/fastqc-untrimmed-results
mv *.html /home/gea_user/rna-seq-project/fastqc-untrimmed-results
You can browse your HTML (webpage) results in the file browser on the left (rna-seq-project > fastqc-results) - you can click the top-most folder icon to navigate to the home directory for this Jupyter lab session.
We will run fastqc on all our files and examine the output in the next notebook:
As a reminder - in this laboratory you will be assigned one or more of the 6 FASTQ files to follow through the rest of the analysis. The data files from the leptin experiment are:
SRA Sample | Sample Name | File Name |
---|---|---|
SRS1794108 | High-Fat Diet Control 1 | SRR5017135.fastq.gz |
SRS1794110 | High-Fat Diet Control 2 | SRR5017137.fastq.gz |
SRS1794106 | High-Fat Diet Control 3 | SRR5017133.fastq.gz |
SRS1794105 | High-Fat Diet Tumor 1 | SRR5017132.fastq.gz |
SRS1794101 | High-Fat Diet Tumor 2 | SRR5017128.fastq.gz |
SRS1794111 | High-Fat Diet Tumor 3 | SRR5017138.fastq.gz |
FastQC
on your assigned sample (e.g. if you are assigned High-Fat Diet Control 1) your filename is SRR5017135.fastq.gz
. To run the FastQC
program you type fastqc
, then a space, then th name of the file:Exampe:
fastqc fastqfile.fastq.gz
fastqc-results
folder. We do this using the mv
(move) command belowLet's move all results to our previously created folder:
mv *.zip /home/gea_user/rna-seq-project/fastqc-untrimmed-results
mv *.html /home/gea_user/rna-seq-project/fastqc-untrimmed-results
rna-seq-project
folder and then the fastqc-untrimmed-results
folder.