162 words

Personal Notes on RNA-seq

1 Overview

Typical RNA-seq data analysis workflow:

  1. Trimming, i.e. removal of the adapter sequences and poor-quality nucleotides. Quality control checks help to indicate whether trimming has been carried out appropriately.
  2. Alignment to a reference genome
  3. Gene quantification and normalization with reference to a file containing gene positions (GTF file)
  4. Differential expression (DE) analysis across conditions

2 Trimming

3 Mapping

3.1 STAR

The --outSAMstrandField intronMotif option adds an XS attribute to the spliced alignments in the BAM file, which is required by Cufflinks for unstranded RNA-seq data (Dobin and Gingeras 2015).

3.2 HISAT2

HISAT2’s (Kim-2019?) alignment algorithm is based on a graph Ferragina Manzini index, which is faster and more memory-efficient than STAR.

HISAT2 binaries and indexes for H. sapians and a few model organisms are available for download from the official website.

Dobin, Alexander, and Thomas R. Gingeras. 2015. “Mapping RNA-Seq Reads with STAR.” Current Protocols in Bioinformatics 51 (1): 11.14.1–19. https://doi.org/https://doi.org/10.1002/0471250953.bi1114s51.